Posterior predictive check question

Ah, I think I came now a step further. To make it more clear to me, I switched to the prior predictive, because this does not even have any free parameter, both mu and sigma are fixed, and nevertheless the plot_ppc function puts out very different heights of the kdes.

Also, what your explanations made clear to me is the effect of the sample size of the observations on the predictive samples. I guess the process is like follows:

  • Let’s assume that I observed only 10 data points and passed these to the likelihood.
  • Now I say idata.extend(pm.sample_prior_predictive(return_inferencedata=True, samples=33)).
  • This leads to 33 sets of “fake observed data”, each of size 10, because this was the size of my observed data.
  • As though the generating process now does not have any free parameters, it of course still has the randomness of a Normal.
  • Because I draw only 10 times each, I get 33 KDEs which look very different.

What I obviously still have to learn is to separate the terminology like “draws”, “samples” asf. For example, when I say pm.sample_prior_predictive(return_inferencedata=True, **samples**=33), the result in the prior_predictive group will be 33 “draws”, each of the size of the observed data that I passed to the likelihood. And when I see it correctly, I cannot change this size by any parameter of the sample_prior_predictive function, it is always determined by the size of the observed_data of the likelihood.

I simulated the plot_ppc function (for the prior_predictive data) by using something like

import seaborn as sns
fig, ax = plt.subplots(1,1, figsize=std_figsize)
sns.kdeplot(pd.DataFrame(idata.prior_predictive["obs"].values[0][1]), ax=ax)
sns.kdeplot(pd.DataFrame(idata.prior_predictive["obs"].values[0][2]), ax=ax)
sns.kdeplot(pd.DataFrame(idata.prior_predictive["obs"].values[0][3]), ax=ax)

which then shows this effect of varying kde shapes

Thanks for your help!