Hello! I’m following the Statistical Rethinking course in a study group at work and doing the problem for week 2, question 3: stat_rethinking_2023/week02.pdf at main · rmcelreath/stat_rethinking_2023 · GitHub
We have the following dataset (showing 5 rows) (csv link):
height weight age male
0 121.92 19.617854 12.0 1
1 105.41 13.947954 8.0 0
2 86.36 10.489315 6.5 0
3 109.22 15.989118 7.0 0
4 114.30 17.860185 11.0 1
The question is concerned with modelling the total impact of age on weight. My solution for the previous question - ignoring gender - is the following:
import pymc as pm
import arviz as az
import pandas as pd
df = pd.read_csv("Howell1.csv", delimiter=";").query("age < 13").reset_index(drop=True)
with pm.Model() as m:
age = df["age"].to_numpy()
weight = df["weight"].to_numpy()
intercept = pm.Normal(name="intercept", mu=5, sigma=1)
scale_age_weight = pm.Normal(name="scale_age_weight", mu=2, sigma=1)
sigma = pm.LogNormal(name="sigma", mu=0, sigma=1)
likelihood = pm.Normal(
name="likelihood",
mu=intercept + scale_age_weight * age,
sigma=sigma,
observed=weight,
)
trace = pm.sample()
print(az.summary(trace))
# mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
# intercept 7.150 0.335 6.539 7.795 0.008 0.006 1810.0 2441.0 1.0
# scale_age_weight 1.379 0.051 1.284 1.477 0.001 0.001 1745.0 2229.0 1.0
# sigma 2.548 0.142 2.294 2.821 0.003 0.002 2335.0 2402.0 1.0
# mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
# intercept 7.150 0.335 6.539 7.795 0.008 0.006 1810.0 2441.0 1.0
# scale_age_weight 1.379 0.051 1.284 1.477 0.001 0.001 1745.0 2229.0 1.0
# sigma 2.548 0.142 2.294 2.821 0.003 0.002 2335.0 2402.0 1.0
For question 3, we’re interested in two things:
- Computing the above but for each gender
- Finding the contrast distribution (subtracting the female posterior from the male posterior) and calculating the mean.
I can easily solve the first point by filtering the input dataframe for male == 1
and male == 0
. Is there a way I can instead corporate the this into my model?
For the second point, I’m feeling a bit hacky in what I’m doing:
contrast_trace = male_trace.copy()
contrast_trace.posterior = male_trace.posterior - female_trace.posterior
az.plot_posterior(contrast_trace);
az.summary(contrast_trace);
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
intercept 0.459 0.647 -0.799 1.637 0.015 0.010 1936.0 2179.0 1.0
scale_age_weight 0.144 0.102 -0.045 0.338 0.002 0.002 1965.0 2450.0 1.0
sigma 0.438 0.305 -0.137 0.993 0.006 0.005 2358.0 2484.0 1.0
This seems to work okay, but hardly feels like the right way to do it.
Does anyone have any suggestions to improve my solutions?