Posterior predictive check question

Let me be slightly more careful here. In your example, there are several notions of “the observed values”. You have observed_data and you have likelihood = pm.Normal("obs"... You then connect these, telling the model that the latter should be related to the former (using the observed kwarg). When you perform posterior predictive sampling, you request that your likelihood/obs parameter be resampled, asking what values could have been observed conditional on your posterior. For each posterior draw you get one vector of credible values of likelihood/obs.

Perhaps, a different example might help. Let’s use the same data generating process, but only generate a small number of observations:

true_mean_of_generating_process = 5
true_sigma_of_generating_process = 2

observed_data = np.random.normal(loc=true_mean_of_generating_process,
                                 scale=true_sigma_of_generating_process,
                                 size=10)

Now the PPC procedure generates something like this:


These individual curves look nothing like normal distributions, but that’s because each one is a KDE based on 10 data points (each of which is a draw from an actual normal distribution).

Does that help clarify at all?