Why does this Bayesian regression analysis fail?

When I do prior predictive checks of regression models I sample and plot regression lines. For example:

with pm.Model() as mdl:

    intercept = pm.Normal("intercept", mu = 0, sigma = 20)
    slope = pm.Normal("slope", mu = 0, sigma = 20)
    sigma = pm.HalfCauchy("sigma", beta = 10)

    likelihood = pm.Normal("y", mu = intercept + slope * df.Date, sigma = sigma, observed = df.Height)
    prior = pm.sample_prior_predictive()
    
prior.prior["y_model"] = prior.prior["intercept"] + prior.prior["slope"] * xr.DataArray(df.Date)
fig, ax = plt.subplots()
ax.plot(df.Date, prior.prior['y_model'].values[0].T, color='0.5', alpha=0.25);

Generates the following plot:

image

Viewed this way, it’s clear that the model is biased towards models with relatively flat slopes. We can also look at the y-axis scale and decide if it makes sense given domain knowledge about the object being studied. I don’t know what your data represents, but if we were talking about human height for example, I would be unhappy with that fact that I’m generating models with a mean of plus or minus 20,000 with high probability.

1 Like