I was wondering if it was possible in the API to sample the prior predictive distribution ? It can be useful, either to generate synthetic data from a model, or to check before inference that the generative model makes sense.
This should be possible if you first build your model without specifying observed= on any node and then sampling as usual. The unpleasantness is that (to my knowledge) that when you want to perform inference, you will need to restate the entire model. The sampled decorator can help eliminate some of this tedium.
I’ve noticed that in that case, and more generally for a model with no oberved variable, NUTS or another MCMC is used as default sampler. In that case though, we could simply draw samples following the DAG, and it would provide a better sampler.
Does this basic “forward” sampler, that would only work for models with no observed variable, exist in the API ?
The sampled decorator is very cool. I’ve been using it for replicative prior-predictive samples, but because the DAG will use NUTS to sample from known distributions, the sampling is slow. I understand that the DAG may have DensityDists and other such wierdnesses, but if the generative model can be sampled using underlying scipy.stats functions, things would be way faster.
For example, sampling a poisson prior predictive with a rate parameter sampled from a half-normal(0,4) is actually way slower than chaining those samples in scipy.stats.
The problem in the sampling for psample2 in the notebook is that the sampler clearly uses proposals to sample using Metropolis or what not; but looks like the pull request will put an end to it and use the distrubutions rng!
Hi! Just to close the loop on this, a recent PR made it much easier to do forward sampling from a model with no data.
For the record, the function is pm.sample_prior_predictive, and not pm.sample_generative. It’s on the GitHub bleeding edge, but it looks like it’ll be included in the next release.
I’m not sure I fully understand how sample_prior_predictive works. Is there something about the PyMC3 language that ensures that any model is a Bayes network and so can support forward sampling? An earlier comment in this thread uses the term “DAG”, but it wasn’t clear to me that all PyMC3 models are necessarily a DAG. I guess if the variable constructors are executed eagerly instead of lazily, then all the parents of a node must exist before the node is created, so the graph must be acyclic. Is that correct?
I have read the code more deeply, and now I am definitely confused. As @antoinebaker points out,
This is because a forward sampler can be done using the value of each node conditioned on the values of its parents, and no rejection is necessary because there are no observations that could force you to reject a sample as inconsistent with the evidence.
But when I look at _draw_values there’s a case for variables that do have observations.
Is that just to make the prior predictive sampler generate the same number of samples as there are observations? Or is this trying to make a sample that is consistent with the observations. I guess I assume that it’s the former, since otherwise we would need to filter out inconsistent samples. It would be nice if there were some comments in the body of _draw_values to clarify this question.
I just did as you said, and surprisingly found that in the result of pm.summary, the mu and sigma of some betas is not standard normally distributed(They were defined to be standard normally distributed in my model). Could you please help me to explain this? Thank you so much!