Sample from prior?

Is it possible to generate samples from an untrained model? This seems useful for assessing a prior. However, when I try using sample() with a model I have created that has hyperparameters, I get this error:

Traceback (most recent call last):
 File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/joblib/externals/loky/backend/queues.py", line 151, in _feed
obj, reducers=reducers)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/joblib/externals/loky/backend/reduction.py", line 145, in dumps
p.dump(obj)
ValueError: must use protocol 4 or greater to copy this object; since __getnewargs_ex__ returned keyword arguments.

I found lots of answers about prior predictive sampling, but nothing for just prior sampling.

prior predictive sampling (pm.sample_prior_predictive) is the thing you are looking for, it returns a dict which contains the prior samples from priors and the prior predictive.

3 Likes

Sorry if I’m being stupid, but when I do help(pm.sampling) I don’t seem to have this method available. I have only sample_ppc. I’ve got pymc3 3.4.1 from pip installed. Am I using a version that’s too old?

Ohhh right, it is on master (sorry about that). If you wait 1-2 days we are releasing 3.5 very soon.

Thanks! I’ll try pip installing from git for now.

A follow-up question: sample_prior_predictive is taking very long to run on my model. This surprises me, because my model is a causal, generative model. There’s one layer of hyper-parameters, then a layer of Gaussians whose parameters are weighted sums of the hyperparameters.
It should be possible for PyMC3 to sample from this distribution very quickly, by simple forward sampling, so I’m surprised that it takes so long.
Is there any way to “tell” PyMC3 to do simple forward sampling? Or do I need to write my own forward sampler for this model?
Thanks for all of your help!

P.S. I was thinking I could write my own forward sampler by looking at model.vars[n].get_parents(), but when I do that in my model I get [] for all of the variables. So is there something I should do to get the model’s links built?
I was thinking I could topologically sort the variables and then generate from them in sort order, and that would be faster than the current method (for my particular model).

Hand written forward stimulation is of course faster, as PyMC3 still need to walk the graph and get/set the state of the RV. But the current implementation is already pretty fast (you can have a look at some previous experiment/implementation https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/Miscellaneous/Test_sample_prior.ipynb).
If you write you own forward sampler, you should not use the model.vars[n], as it would still be quite slow. You should just use the random generation from scipy/numpy, and do the forward yourself (again, see notebook above).

Thank you. I think I have it working now.
One note: I was misled by the vars argument to sample_predictive_prior. This parameter, as far as I can tell, cannot be passed a list of variables, but must be passed a list of variable names. That definitely cost me some time to figure out. The easy solution would be to change the parameter name to varnames, and change the documentation from

vars : iterable
    Variables for which to compute the posterior predictive samples.

to

varnames : iterable
    Names of the variables for which to compute the posterior predictive samples.

Alternatively, you could keep the name, and check to see if vars is already bound to a list of variables before treating it as a list of names and trying to look them up.

Oh yes you are right - would you like to send a pull request to improve the docstring?