I have limited sample of age and height data (d) take from the full data set from McElreath’s book: Statistical Rethinking age_height.csv (1.9 KB)
I divided the data into training and testing sets:
train, test = train_test_split(d, test_size=0.5, random_state=13)
Then, I build 6 polynomial models with FLAT, UN-INFORMATIVE priors. Here’s an example of model number 3 with three parameters.
age_m3 = np.vstack((train.age_std,train.age_std**2,train.age_std**3))
with pm.Model() as m3:
alpha = pm.Normal('alpha', mu=0, sd=200, testval=a_start)
beta = pm.Normal('beta', mu=0, sd=50, shape=3)
sigma = pm.HalfCauchy('sigma', beta=30, testval=sigma_start)
mu = pm.Deterministic('mu', alpha + pm.math.dot(beta, age_m3))
Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height'])
trace_m3 = pm.sample(1000, tune=1000, random_seed=13)
I also build 6 polynomial models with INFORMATIVE priors. Here’s an example of model number 3 again.
with pm.Model() as m3_tighter_prior:
alpha = pm.Normal('alpha', mu=0, sd=100, testval=a_start)
beta = pm.Normal('beta', mu=0, sd=10, shape=3)
sigma = pm.HalfCauchy('sigma', beta=10, testval=sigma_start)
mu = pm.Deterministic('mu',alpha + pm.math.dot(beta, age_m3))
Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height'])
trace_m3_tighter_prior = pm.sample(1000, tune=1000, random_seed=13)
To finish, I calculate the in-sample deviance and the WAIC value (estimated out-of-sample deviance) for informative and un-informative priors.
For example, here is the in-sample deviance for model 3 of informative priors.
theta_3_tighter_prior = pm.summary(trace_m3_tighter_prior)['mean'][:5]
dev_3_in_sample_tighter_prior = - 2 * sum(stats.norm.logpdf(train['height'],
loc = theta_3_tighter_prior[0] +
theta_3_tighter_prior[1] * train['age_std'] +
theta_3_tighter_prior[2] * train['age_std']**2 +
theta_3_tighter_prior[3] * train['age_std']**3,
scale = theta_3_tighter_prior[4]))
WAIC value of model 3 of informative priors.
waic_tighter_3 = pm.waic(trace_m3_tighter_prior, m3_tighter_prior).WAIC
Plotting all deviances:
As expected, simulated in-sample deviances (red/green) are lower than WAIC values (estimated out-of-sample deviances, blue/gray), because in-sample deviances tend to overfit the data.
My question:
1. In theory, the WAIC values for informative-prior models are supposed to be lower than WAIC values for un-informative-prior models. Why is this the case only for models with 3 and 4 parameters?
2. The difference between green and red (in-sample deviances of informative and un-informative priors) is smallest at model 3. Does that mean anything?
Question 3:
Models with UN-informative priors
m1 | m2 | m3 | m4 | m5 | m6 | |
---|---|---|---|---|---|---|
alpha | 145.860602 | 156.110797 | 156.689380 | 156.944409 | 156.026089 | 156.222594 |
sigma | 16.107881 | 12.437511 | 10.077042 | 10.316230 | 10.010530 | 9.983804 |
Models with informative priors
m1 | m2 | m3 | m4 | m5 | m6 | |
---|---|---|---|---|---|---|
alpha | 145.484445 | 153.437437 | 154.973432 | 153.968937 | 154.249675 | 154.171092 |
sigma | 15.919727 | 12.373770 | 9.781053 | 9.873798 | 10.223518 | 10.208682 |
As more parameters are added, the model residual (sigma) of models with UN-informative priors decrease. However, with informative priors, the lowest model residuals belong to model 3 and 4.
Is a lower model residual a sign that the variables of age explain more of the variation in height? Is this a sign that model 3 or 4 are the most accurate at Retrodicting height?