What does it mean when posterior predictive mean is outside of the credibility values?


A general question in regards to the posterior predictive. What does it imply when the posterior mean is outside of the predictive samples themselves. If you see below, the lower end of the x axis, the mean is way outside of the posterior samples. I’m not sure what that implies except for something being wrong with my model. Has anyone seen this before?

I think this is an arviz plotting issue. The binning resolution appears different in the posterior predictive and the posterior predictive mean.

Indeed. We might look at enforcing the same bins for the “posterior predictive mean” and all “posterior predictive” lines, maybe also the “observed” one, or maybe only enforce the same bins for “observed” and “posterior predictive mean”.

Right now, ArviZ defaults to using the optimal bins (both number and bin edges) in each case independently of the other plotted lines. As the bins depend on the range of the values and on the number of values to be binned, this is not an issue when histograms are generated from data with the same number of values; you can see all “posterior predictive” lines are very similar but don’t have the exact same bins, therefore, the difference between “posterior predictive” and “observed” is relevant, the model is not completely capturing the dispersion of the observations. However, the “posterior predictive mean” is generated from all posterior predictive samples from all chains and draws, so it has more samples and the optimum is using more and narrower bins.

Note: the current behaviour has a nice feature though which is that the start and end of each line is the min and max values in the provided samples. Therefore, the end of “posterior predictive mean” being around 3500 is a clear indicator that no sample in the posterior can account for the observations around 4k or 5k, which indicates the model has an issue here.

Personally, my main recommendation however is to default to ECDF plots whenever possible. In this case that would mean using kind="cumulative" when calling plot_ppc. Also it is generally a good idea to combine it with plot_bpv or plot_loo_pit.

1 Like