I was skimming through the code in
pymc4.sample_prior_predictive, and I wanted to clarify some concepts.
I realized that it always returns all the results in the
prior_predictive group of the inference data group, which I think will be confusing for users (especially as the importance of both prior and prior predictive checks increases) and can also make harder to use all of ArviZ features.
Would it be possible to divide the variables into
prior_predictivegroups in the same way as variables are divided between
I have found that conceptually distinguishing between
prior_predictive is generally harder than between
posterior_predictive, and keeping them combined in PyMC4 will probably keep the confusion alive. Below I list the two main arguments that came to mind when keeping both quantities combined, because I am not sure I completely grasp the whole situation.
I know that both quantities can be sampled at the same time and therefore doing something like
prior = pm.sample_prior(model); prior_pred = pm.sample_prior_predictive(prior, model) is not efficient at all. However, both quantities can be sampled at the same time and still be stored each in the corresponding group of the resulting inference data. When the return value is an inference data object, computational efficiency and storing them in different groups seem perfectly compatible.
I have seen the argument
sample_from_observed which may difficult distinguishing between the two quantities, however, I have not been able to understand what it does conceptually. To me, neither of
prior_predictive=\int p(y^*|\theta) p(\theta) d\theta know about the observed data y, so I can’t wrap my head around what is computed by
sample_from_observed=False. We only get samples from \theta (prior/posterior variables) and their distribution is somehow conditional to the observed data y but it clearly isn’t the posterior.