I have limited sample of age and height data (d) take from the full data set from McElreath’s book: Statistical Rethinking age_height.csv (1.9 KB)

I divided the data into training and testing sets:

`train, test = train_test_split(d, test_size=0.5, random_state=13)`

Then, I build 6 polynomial models with **FLAT, UN-INFORMATIVE** priors. Here’s an example of model number 3 with three parameters.

```
age_m3 = np.vstack((train.age_std,train.age_std**2,train.age_std**3))
with pm.Model() as m3:
alpha = pm.Normal('alpha', mu=0, sd=200, testval=a_start)
beta = pm.Normal('beta', mu=0, sd=50, shape=3)
sigma = pm.HalfCauchy('sigma', beta=30, testval=sigma_start)
mu = pm.Deterministic('mu', alpha + pm.math.dot(beta, age_m3))
Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height'])
trace_m3 = pm.sample(1000, tune=1000, random_seed=13)
```

I also build 6 polynomial models with **INFORMATIVE** priors. Here’s an example of model number 3 again.

```
with pm.Model() as m3_tighter_prior:
alpha = pm.Normal('alpha', mu=0, sd=100, testval=a_start)
beta = pm.Normal('beta', mu=0, sd=10, shape=3)
sigma = pm.HalfCauchy('sigma', beta=10, testval=sigma_start)
mu = pm.Deterministic('mu',alpha + pm.math.dot(beta, age_m3))
Height = pm.Normal('Height', mu=mu, sd=sigma, observed=train['height'])
trace_m3_tighter_prior = pm.sample(1000, tune=1000, random_seed=13)
```

To finish, I calculate the in-sample deviance and the WAIC value (estimated out-of-sample deviance) for informative and un-informative priors.

For example, here is the in-sample deviance for model 3 of informative priors.

```
theta_3_tighter_prior = pm.summary(trace_m3_tighter_prior)['mean'][:5]
dev_3_in_sample_tighter_prior = - 2 * sum(stats.norm.logpdf(train['height'],
loc = theta_3_tighter_prior[0] +
theta_3_tighter_prior[1] * train['age_std'] +
theta_3_tighter_prior[2] * train['age_std']**2 +
theta_3_tighter_prior[3] * train['age_std']**3,
scale = theta_3_tighter_prior[4]))
```

WAIC value of model 3 of informative priors.

`waic_tighter_3 = pm.waic(trace_m3_tighter_prior, m3_tighter_prior).WAIC`

Plotting all deviances:

As expected, simulated in-sample deviances (red/green) are lower than WAIC values (estimated out-of-sample deviances, blue/gray), because in-sample deviances tend to overfit the data.

My question:

**1. In theory, the WAIC values for informative-prior models are supposed to be lower than WAIC values for un-informative-prior models. Why is this the case only for models with 3 and 4 parameters?**

**2. The difference between green and red (in-sample deviances of informative and un-informative priors) is smallest at model 3. Does that mean anything?**

**Question 3:**

Models with UN-informative priors

m1 | m2 | m3 | m4 | m5 | m6 | |
---|---|---|---|---|---|---|

alpha | 145.860602 | 156.110797 | 156.689380 | 156.944409 | 156.026089 | 156.222594 |

sigma | 16.107881 | 12.437511 | 10.077042 | 10.316230 | 10.010530 | 9.983804 |

Models with informative priors

m1 | m2 | m3 | m4 | m5 | m6 | |
---|---|---|---|---|---|---|

alpha | 145.484445 | 153.437437 | 154.973432 | 153.968937 | 154.249675 | 154.171092 |

sigma | 15.919727 | 12.373770 | 9.781053 | 9.873798 | 10.223518 | 10.208682 |

As more parameters are added, the model residual (sigma) of models with UN-informative priors decrease. However, with informative priors, the lowest model residuals belong to model 3 and 4.

**Is a lower model residual a sign that the variables of age explain more of the variation in height? Is this a sign that model 3 or 4 are the most accurate at Retrodicting height?**