# Is Prior Predictive Distribution always usefull and usable?

I’m wondering about how to use the prior predictive distribution (PrPD) and about its usefulness.

The literature on the subject is vast and I will only quote Gelman (The Prior Can Often Only Be Understood in the Context of the Likelihood) who says: “A fundamental tool for understanding the effect of the prior on inference before data has been collected is the prior predictive distribution […] The careful application […] leads us to some concrete recommendations of how to choose a prior that ensures robust Bayesian analyzes in practice.

Ok, but let’s take the concrete example of a very simple beta-binomial model. How can we use the prior predictive distribution in that case?

As an example, suppose I plan an experiment consisting of n=73 Bernoulli draws with parameter \theta and decide to use as prior \beta(0.5, 0.5) distribution. I know that the likelihood is a binomial function depending on the number x of successes.

I then can sample my prior and, for each randomly drawn \theta_i, sample the corresponding likelihood and get a random x_i. As far as I know, the histogram of the x_i thus obtained (varying between 0 and 73) corresponds to the prior predictive distribution which, in my case will look like the \beta(0.5, 0.5) distribution. Is that right?

If so, what “concrete recommendations” could I draw from it? If I then perform my intended experiment and get, say, x=46, how could I use my PrPD I just built to “choose a prior that ensures robust Bayesian analyzes in practice”. ?

Am I going to say that my PrPD leads me to predict values of x close to 0 or 73 and that, since my value 46 is not in these “areas”, my prior is unsuitable? And that I should instead use a uniform \beta(1, 1) prior? In this case, I would modulate my prior according to my experience result… is this normal?

Or, is it simply that the consideration of the PrPD is unsuitable in the case of a too simple Bayesian model, like the one I just talked about?

The prior predictive distribution is useful as a tool for exploring the behavior of your model conditional on the priors you have specified. If you have a lot of data, prior specification can matter less, but in cases where data are sparse estimates will be shrunk toward the prior, so its nice to know what that implies for your model.

For example, in sports analytics it can be useful to use your priors to represent the population distribution of the parameter (since that encompasses the target of your inference and prediction most of the time, and we have really good population-wide data). So, sampling from the prior predictive should return quantities that look like your overall population, and from which parameters corresponding to individuals within your analysis might be drawn.

For your particular problem, choosing between two quite uninformative priors–Beta(1,1) vs Beta(0.5, 0.5)–with a sample size of 73 will make little difference, so you don’t have much to gain from running a prior predictive check.

In general, its a cheap and easy way to check on the behavior of your model before it sees any data, and can sometimes catch some misspecification. I don’t do it all the time, but when I don’t I often wish I had!

4 Likes

Thanks to Chris Fonnesbeck for arguing on the issue; however, I am not entirely convinced. After all, this is a beta-binomial model where the number Bernoulli draws is supposed to be fixed in advance (n=73). In this case, before entering data, the prior distribution is quickly fixed: what else could it be but a \beta distribution? Then, whether it’s \beta(0.5,0.5) or \beta(5,5) doesn’t change much in my opinion: the prior predictive distribution can only reflect the chosen prior distribution, if I’m not mistaken…

The question of the interest of doing this remains open.

If I then consider only 1 experiment and find a result x=46 successes, it seems to me that I cannot deduce anything from it and that the interest of carrying out a prior predictive distribution is nil, because there must not be, in this case, any possible prior predictive check; am I right?.

If, on the other hand, I realize a large number of experiments and my results revolve around \bar{x}=36 \sim 73/2, I could probably deduce that a prior \beta(5.5) is more suited to the model than a prior \beta(0.5,0.5) (which tends to predict results close to 0 or 73); right?

But doesn’t that mean that I am now outside the scope of a predictive prior? Since I no longer test the “before it sees any data” model?

Finally, for something as simplistic as a beta-binomial model with a single \theta parameter, I still don’t see the point of calculating a prior predictive distribution… But there may be many other techniques that I don’t am not aware?