I have limited sample of age and height data (d) take from the full data set from McElreath’s book: Statistical Rethinking age_height.csv (1.9 KB)
I divided the data into training and testing sets:
train, test = train_test_split(d, test_size=0.5, random_state=13)
Then, I build 6 polynomial models with FLAT, UN-INFORMATIVE priors. Here’s an example of model number 3 with three parameters.
age_m3 = np.vstack((train.age_std,train.age_std**2,train.age_std**3)) with pm.Model() as m3: alpha = pm.Normal('alpha', mu=0, sd=200, testval=a_start) beta = pm.Normal('beta', mu=0, sd=50, shape=3) sigma = pm.HalfCauchy('sigma', beta=30, testval=sigma_start) mu = pm.Deterministic('mu', alpha + pm.math.dot(beta, age_m3)) Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height']) trace_m3 = pm.sample(1000, tune=1000, random_seed=13)
I also build 6 polynomial models with INFORMATIVE priors. Here’s an example of model number 3 again.
with pm.Model() as m3_tighter_prior: alpha = pm.Normal('alpha', mu=0, sd=100, testval=a_start) beta = pm.Normal('beta', mu=0, sd=10, shape=3) sigma = pm.HalfCauchy('sigma', beta=10, testval=sigma_start) mu = pm.Deterministic('mu',alpha + pm.math.dot(beta, age_m3)) Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height']) trace_m3_tighter_prior = pm.sample(1000, tune=1000, random_seed=13)
To finish, I calculate the in-sample deviance and the WAIC value (estimated out-of-sample deviance) for informative and un-informative priors.
For example, here is the in-sample deviance for model 3 of informative priors.
theta_3_tighter_prior = pm.summary(trace_m3_tighter_prior)['mean'][:5] dev_3_in_sample_tighter_prior = - 2 * sum(stats.norm.logpdf(train['height'], loc = theta_3_tighter_prior + theta_3_tighter_prior * train['age_std'] + theta_3_tighter_prior * train['age_std']**2 + theta_3_tighter_prior * train['age_std']**3, scale = theta_3_tighter_prior))
WAIC value of model 3 of informative priors.
waic_tighter_3 = pm.waic(trace_m3_tighter_prior, m3_tighter_prior).WAIC
Plotting all deviances:
As expected, simulated in-sample deviances (red/green) are lower than WAIC values (estimated out-of-sample deviances, blue/gray), because in-sample deviances tend to overfit the data.
1. In theory, the WAIC values for informative-prior models are supposed to be lower than WAIC values for un-informative-prior models. Why is this the case only for models with 3 and 4 parameters?
2. The difference between green and red (in-sample deviances of informative and un-informative priors) is smallest at model 3. Does that mean anything?
Models with UN-informative priors
Models with informative priors
As more parameters are added, the model residual (sigma) of models with UN-informative priors decrease. However, with informative priors, the lowest model residuals belong to model 3 and 4.
Is a lower model residual a sign that the variables of age explain more of the variation in height? Is this a sign that model 3 or 4 are the most accurate at Retrodicting height?