That’s the point of prior predictive checks, so you can calibrate your priors before running the model. The problem is that your approach goes the opposite way, you’re trying to make your distributions fit on pre-assigned values to prior parameters (obtained from point estimates). That’s not going to work. LogNormal distributions have very long tails, so a great amount amount of zero values happen towards the tail of the distribution, which will push your mean to zero. That does not necessarily mean that the model is worse at inference (same for WAIC or LOO model comparison, which are more indicative of predictive capacity). However, a more constrained model could be better for theoretical reasons (again that’s what you must decide according to your domain knowledge).
A Normal distribution is definitely inappropriate, as it allows for negative values, but maybe a TruncatedNormal could work better. It’s also better to start the prior predictive checks from less informative priors, unless they come from very clear theoretical knowledge, or from a previous Bayesian model (prior update).
import numpy as np
import matplotlib.pyplot as plt
r = 0.8
alpha = 4
sigma = 0.8
x0 = np.sort(pm.draw(pm.LogNormal.dist(alpha, sigma), draws=100, random_seed=27))
x = np.sort(pm.draw(pm.LogNormal.dist(alpha, sigma), draws=100, random_seed=77))
y = r*x + np.sqrt(1-r**2)*x0
with pm.Model() as mod:
alpha_m = pm.HalfNormal("alpha", 1) #intercept
beta_m = pm.HalfNormal("beta", 1) #slope
mu = pm.Deterministic("mu", alpha_m + beta_m*x) #mean
sigma = pm.HalfNormal("sigma", 10) #sd
y_hat = pm.TruncatedNormal("y", mu=mu, sigma=sigma, lower=0, observed=y)
with mod:
prior_pred = pm.sample_prior_predictive()
az.plot_ppc(prior_pred, group="prior")
az.plot_ppc(prior_pred, group="prior", kind="cumulative")


Which definitely looks better than the LogNormal, and allows you to have a more appropriate likelihood (not negative values). Now, it’s important to keep in mind that the simulated data comes from a LogNormal distribution, and that’s why my model in the previous example can recover the parameters with reasonable precision. The TruncatedNormal model does a reasonable job for the mean (alpha), but it’s a bit off for the sd (sigma).
Comparing the 3 models seems to be useful as well (though often not recommended as a method for model selection), but note that now the Normal model has 0 weight and also the LogNormal gives a warning (something may be miss-specified):
| rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
|---|---|---|---|---|---|---|---|---|---|
| TruncatedNormal | 0 | -362.1440699 | 9.12993812 | 0 | 0.962512852 | 15.11492559 | 0 | FALSE | log |
| Normal | 1 | -362.8778718 | 9.6356368 | 0.733801882 | 0 | 15.43134132 | 0.521035978 | FALSE | log |
| LogNormal | 2 | -480.1654312 | 6.444936509 | 118.0213612 | 0.037487148 | 12.77992679 | 14.71650608 | TRUE | log |
An important note, is that the point of assessing your models with toy data (or generative model simulations) is to test whether you can retrieve the original parameters by using relatively uninformative priors in your model. So, you should not use the same alpha for your HalfNormal priors. Also beta_m is a prior for the slope, so using alpha there does not seem too appropriate. Additionally, HalfNormal’s default parameter is a scale parameter (sigma), so not the same as the location parameter alpha you used for the LogNormal. All these details seem a bit nitpicky, but ultimately knowing these details very well can greatly help with prior selection. Hope this helps.
