Sampling from prior predictive distribution

antoinebaker · July 3, 2017, 5:05pm

Hello,

thanks for the pymc3 package, it’s really great.

I was wondering if it was possible in the API to sample the prior predictive distribution ? It can be useful, either to generate synthetic data from a model, or to check before inference that the generative model makes sense.

Thanks !

AustinRochford · July 3, 2017, 6:17pm

This should be possible if you first build your model without specifying observed= on any node and then sampling as usual. The unpleasantness is that (to my knowledge) that when you want to perform inference, you will need to restate the entire model. The sampled decorator can help eliminate some of this tedium.

antoinebaker · July 4, 2017, 9:17am

Nice, it works.

I’ve noticed that in that case, and more generally for a model with no oberved variable, NUTS or another MCMC is used as default sampler. In that case though, we could simply draw samples following the DAG, and it would provide a better sampler.

Does this basic “forward” sampler, that would only work for models with no observed variable, exist in the API ?

Thanks !

AustinRochford · July 4, 2017, 12:17pm

I do not believe this “forward” sampler is available at the moment, but could be good to add.

junpenglao · July 4, 2017, 2:11pm

I am not sure a “forward” sampler is a good idea. It will be highly inefficient except some simple low dimension problem.

rahuldave · April 10, 2018, 5:35pm

The sampled decorator is very cool. I’ve been using it for replicative prior-predictive samples, but because the DAG will use NUTS to sample from known distributions, the sampling is slow. I understand that the DAG may have DensityDists and other such wierdnesses, but if the generative model can be sampled using underlying scipy.stats functions, things would be way faster.

For example, sampling a poisson prior predictive with a rate parameter sampled from a half-normal(0,4) is actually way slower than chaining those samples in scipy.stats.

Any thoughts on speedups?

junpenglao · April 10, 2018, 7:13pm

Wow looking back I clearly doesnt understand prior predictive sampling

We are working to implement it in the code base https://github.com/pymc-devs/pymc3/pull/2876
Meanwhile, you can use the code in this notebook.

rahuldave · April 10, 2018, 7:54pm

Thanks!

I had got confused too. There is this difference between posterior/prior predictive sampling and replicative sampling I never understood at first…

rahuldave · April 10, 2018, 8:02pm

The problem in the sampling for psample2 in the notebook is that the sampler clearly uses proposals to sample using Metropolis or what not; but looks like the pull request will put an end to it and use the distrubutions rng!

So sweet!

_eigenfoo · July 9, 2018, 10:41pm

Hi! Just to close the loop on this, a recent PR made it much easier to do forward sampling from a model with no data.

For the record, the function is pm.sample_prior_predictive, and not pm.sample_generative. It’s on the GitHub bleeding edge, but it looks like it’ll be included in the next release.

rpgoldman · August 7, 2018, 8:28pm

I’m not sure I fully understand how sample_prior_predictive works. Is there something about the PyMC3 language that ensures that any model is a Bayes network and so can support forward sampling? An earlier comment in this thread uses the term “DAG”, but it wasn’t clear to me that all PyMC3 models are necessarily a DAG. I guess if the variable constructors are executed eagerly instead of lazily, then all the parents of a node must exist before the node is created, so the graph must be acyclic. Is that correct?

rpgoldman · August 7, 2018, 8:39pm

I have read the code more deeply, and now I am definitely confused. As @antoinebaker points out,

This is because a forward sampler can be done using the value of each node conditioned on the values of its parents, and no rejection is necessary because there are no observations that could force you to reject a sample as inconsistent with the evidence.

But when I look at _draw_values there’s a case for variables that do have observations.

Is that just to make the prior predictive sampler generate the same number of samples as there are observations? Or is this trying to make a sample that is consistent with the observations. I guess I assume that it’s the former, since otherwise we would need to filter out inconsistent samples. It would be nice if there were some comments in the body of _draw_values to clarify this question.

junpenglao · August 8, 2018, 4:12am

Yes. I should have put more comments there when I added that part…

MZhang · August 18, 2021, 10:43am

Hi,

I just did as you said, and surprisingly found that in the result of pm.summary, the mu and sigma of some betas is not standard normally distributed(They were defined to be standard normally distributed in my model). Could you please help me to explain this? Thank you so much!

Topic		Replies	Views
Sample from prior? Questions	9	3423	July 20, 2018
Posterior Predictive Sampling in PyMC3 by Luciano Paz PyMCon2020	2	1619	June 6, 2024
Posterior Predictive Sampling -- Works for Hierarchical Model?	3	2157	June 12, 2019
Manually sampling from prior predictive Questions	10	1063	July 4, 2022
Question about sampling from prior predictive with time series distributions Questions	0	560	August 26, 2019

Sampling from prior predictive distribution

Related topics