Bayesian Regression: Inconsistent Prior Predictive Checks

That’s the point of prior predictive checks, so you can calibrate your priors before running the model. The problem is that your approach goes the opposite way, you’re trying to make your distributions fit on pre-assigned values to prior parameters (obtained from point estimates). That’s not going to work. LogNormal distributions have very long tails, so a great amount amount of zero values happen towards the tail of the distribution, which will push your mean to zero. That does not necessarily mean that the model is worse at inference (same for WAIC or LOO model comparison, which are more indicative of predictive capacity). However, a more constrained model could be better for theoretical reasons (again that’s what you must decide according to your domain knowledge).

A Normal distribution is definitely inappropriate, as it allows for negative values, but maybe a TruncatedNormal could work better. It’s also better to start the prior predictive checks from less informative priors, unless they come from very clear theoretical knowledge, or from a previous Bayesian model (prior update).

import numpy as np
import matplotlib.pyplot as plt

r = 0.8
alpha = 4
sigma = 0.8

x0 = np.sort(pm.draw(pm.LogNormal.dist(alpha, sigma), draws=100, random_seed=27))
x = np.sort(pm.draw(pm.LogNormal.dist(alpha, sigma), draws=100, random_seed=77))

y = r*x + np.sqrt(1-r**2)*x0  

with pm.Model() as mod:
    alpha_m = pm.HalfNormal("alpha", 1) #intercept
    beta_m = pm.HalfNormal("beta", 1) #slope
    mu = pm.Deterministic("mu", alpha_m + beta_m*x) #mean
    sigma = pm.HalfNormal("sigma", 10) #sd
    y_hat = pm.TruncatedNormal("y", mu=mu, sigma=sigma, lower=0, observed=y)

with mod:
    prior_pred = pm.sample_prior_predictive() 

az.plot_ppc(prior_pred, group="prior")
az.plot_ppc(prior_pred, group="prior", kind="cumulative")

image
image

Which definitely looks better than the LogNormal, and allows you to have a more appropriate likelihood (not negative values). Now, it’s important to keep in mind that the simulated data comes from a LogNormal distribution, and that’s why my model in the previous example can recover the parameters with reasonable precision. The TruncatedNormal model does a reasonable job for the mean (alpha), but it’s a bit off for the sd (sigma).

Comparing the 3 models seems to be useful as well (though often not recommended as a method for model selection), but note that now the Normal model has 0 weight and also the LogNormal gives a warning (something may be miss-specified):

rank elpd_loo p_loo elpd_diff weight se dse warning scale
TruncatedNormal 0 -362.1440699 9.12993812 0 0.962512852 15.11492559 0 FALSE log
Normal 1 -362.8778718 9.6356368 0.733801882 0 15.43134132 0.521035978 FALSE log
LogNormal 2 -480.1654312 6.444936509 118.0213612 0.037487148 12.77992679 14.71650608 TRUE log

An important note, is that the point of assessing your models with toy data (or generative model simulations) is to test whether you can retrieve the original parameters by using relatively uninformative priors in your model. So, you should not use the same alpha for your HalfNormal priors. Also beta_m is a prior for the slope, so using alpha there does not seem too appropriate. Additionally, HalfNormal’s default parameter is a scale parameter (sigma), so not the same as the location parameter alpha you used for the LogNormal. All these details seem a bit nitpicky, but ultimately knowing these details very well can greatly help with prior selection. Hope this helps.

3 Likes