Hi all. I have a general doubt on how to properly interpret uncertainty of posterior predictive distributions. I’ve found myself with some posterior predictive checks where the SD or HDIs of the posterior predictive distribution are extremely wide (i.e. with boundaries quite far from observed/mean) or extremely small (i.e. barely above/below observed data/mean). My intuition is that neither of these situations is ideal, but I cannot find good resources, arguments or heuristics on how to interpret that lack/excess of uncertainty. Any thoughts or references would be really appreciated.
Hi Simon, regarding resources I’d recommend Kruschke’s Doing Bayesian Data Analysis (“the puppy book”). Regarding your question, the thing to keep in mind is that the posterior predictive samples are effectively new data that “could” have been observed, given the model structure and posterior parameter distributions inferred by sampling. E.g. in a simple bayesian linear regression model y = b.x + a + epsilon, where epsilon represents stochastic noise samples (e.g. modelled by normal distribution) the posterior predictive distribution on y would account for a, b, and epsilon.
If your posterior predictive y samples don’t resemble your original data in terms of general scaling, variability etc this could be a symptom of your model definition not quite being as intended. Bear in mind also that if you are extrapolating i.e. generating samples outside the domain of the original data (e.g. for a vastly different set of x, i.e. per above regression model), or which explore parts of the domain not well covered by original data, then this could also explain increase variance / oddities (think of how the y given x in y=bx+a varies if x is large and the posterior in b is quite wide, i.e. uncertain gradient).
Hope this helps! And do check out the book (I have a hardcopy, since it is a great read from cover to cover)
Hi. Thank you for the reply. I’ve read the puppy book, but I don’t recall a detailed discussion on how to interpret posterior predictive uncertainty (if I missed that, in which chapter can I find it?). I’m aware about how the marginalization process of the posterior predictive distribution works, but I was hoping to find some resources that discuss in more mathematical and/or methodological detail the implications of extreme uncertainty, or lack thereof, from the posterior predictive distribution respect to the original data (seems a bit more intuitive when extrapolating, though a discussion on uncertainty from predictions on unobserved data may be quite useful as well).