Sampling from prior predictive distribution



thanks for the pymc3 package, it’s really great.

I was wondering if it was possible in the API to sample the prior predictive distribution ? It can be useful, either to generate synthetic data from a model, or to check before inference that the generative model makes sense.

Thanks !

Simulating Fake Data from pymc3 model

This should be possible if you first build your model without specifying observed= on any node and then sampling as usual. The unpleasantness is that (to my knowledge) that when you want to perform inference, you will need to restate the entire model. The sampled decorator can help eliminate some of this tedium.


Nice, it works.

I’ve noticed that in that case, and more generally for a model with no oberved variable, NUTS or another MCMC is used as default sampler. In that case though, we could simply draw samples following the DAG, and it would provide a better sampler.

Does this basic “forward” sampler, that would only work for models with no observed variable, exist in the API ?

Thanks !


I do not believe this “forward” sampler is available at the moment, but could be good to add.


I am not sure a “forward” sampler is a good idea. It will be highly inefficient except some simple low dimension problem.


The sampled decorator is very cool. I’ve been using it for replicative prior-predictive samples, but because the DAG will use NUTS to sample from known distributions, the sampling is slow. I understand that the DAG may have DensityDists and other such wierdnesses, but if the generative model can be sampled using underlying scipy.stats functions, things would be way faster.

For example, sampling a poisson prior predictive with a rate parameter sampled from a half-normal(0,4) is actually way slower than chaining those samples in scipy.stats.

Any thoughts on speedups?


Wow looking back I clearly doesnt understand prior predictive sampling :sweat_smile:

We are working to implement it in the code base
Meanwhile, you can use the code in this notebook.



I had got confused too. There is this difference between posterior/prior predictive sampling and replicative sampling I never understood at first…


The problem in the sampling for psample2 in the notebook is that the sampler clearly uses proposals to sample using Metropolis or what not; but looks like the pull request will put an end to it and use the distrubutions rng!

So sweet!


Hi! Just to close the loop on this, a recent PR made it much easier to do forward sampling from a model with no data.

For the record, the function is pm.sample_prior_predictive, and not pm.sample_generative. It’s on the GitHub bleeding edge, but it looks like it’ll be included in the next release.