Hi, I’m struggling to fit an equivalent model in pymc to one specified using bambi. My code is below. The bambi model is producing different (and correct) results compared with the pymc model. I don’t understand how my model specification in pymc differs structurally than in bambi? Any insight would be most appreciated!
'''
White Tail Deer Dataset
Hefley et al. 2013
Available at https://www.stat.colostate.edu/~hooten/other/BBM2L_Code.zip
'''
deer_df = pd.read_csv('deer.csv', names=['body_mass', 'crop_size', 'sex', 'age', 'age_sq'],
header=0, index_col=0)
deer_df['intercept'] = 1.
# Replace crop_size with categories
deer_df['crop_size_cat'] = deer_df.crop_size.replace({599:0, 614:1})
# Define the x,y data
deer_x = deer_df[['crop_size_cat', 'sex', 'age', 'age_sq']].to_numpy()
deer_y = deer_df[['body_mass']].to_numpy()
# data dimensions
n = deer_x.shape[0] # data rows
p = deer_x.shape[1] # regression parameters
print(n, p)
%%time
# Run model using pymc
with pm.Model() as lm_deer1:
ϵ = pm.HalfNormal('ϵ', 10.)
α = pm.Normal('α', mu=39., sd=100.)
β = pm.Normal('β', mu=0., sd=100., shape=p)
# Linear predictor
μ = pm.Deterministic('μ', α + pm.math.dot(deer_x, β))
# Gaussian Likelihood
y_pred = pm.Normal('y_pred',
mu=μ,
sd=ϵ,
observed=deer_y)
# Sample
idata_lm_deer1 = pm.sample(1000)
# Numerical summary of parameters
az.summary(idata_lm_deer1, var_names=['ϵ', 'α', 'β'], round_to=2)
mean | sd | hdi_5.5% | hdi_94.5% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
ϵ | 15.36 | 0.02 | 15.33 | 15.40 | 0.0 | 0.0 | 3561.41 | 2350.36 | 1.0 |
α | 39.53 | 0.08 | 39.40 | 39.65 | 0.0 | 0.0 | 1932.08 | 1855.40 | 1.0 |
β[0] | 0.00 | 0.07 | -0.11 | 0.11 | 0.0 | 0.0 | 3064.31 | 2528.50 | 1.0 |
β[1] | -0.00 | 0.07 | -0.11 | 0.11 | 0.0 | 0.0 | 2577.86 | 2322.27 | 1.0 |
β[2] | 0.00 | 0.05 | -0.08 | 0.08 | 0.0 | 0.0 | 1955.27 | 2208.42 | 1.0 |
β[3] | -0.00 | 0.01 | -0.01 | 0.01 | 0.0 | 0.0 | 2155.84 | 2205.65 | 1.0 |
# Fit the model with Bambi
lm_deer2 = bmb.Model("body_mass ~ crop_size_cat + sex + age + age_sq", deer_df)
idata_lm_deer2 = lm_deer2.fit(draws=2000)
# Numerical summary of traces
az.summary(idata_lm_deer2, round_to=2)
mean | sd | hdi_5.5% | hdi_94.5% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
Intercept | 13.03 | 0.80 | 11.76 | 14.30 | 0.01 | 0.01 | 2892.27 | 2941.11 | 1.0 |
crop_size_cat | 0.74 | 0.71 | -0.47 | 1.81 | 0.01 | 0.01 | 4291.72 | 2460.97 | 1.0 |
sex | 8.20 | 0.71 | 7.07 | 9.32 | 0.01 | 0.01 | 3736.76 | 2665.11 | 1.0 |
age | 17.46 | 0.51 | 16.67 | 18.28 | 0.01 | 0.01 | 2178.99 | 2324.65 | 1.0 |
age_sq | -1.64 | 0.07 | -1.75 | -1.52 | 0.00 | 0.00 | 2161.45 | 2329.67 | 1.0 |
body_mass_sigma | 7.29 | 0.24 | 6.92 | 7.66 | 0.00 | 0.00 | 4147.96 | 2625.83 | 1.0 |
The values for α (intercept), β[0] (crop_size_cat), β[1] (sex), β[2] (age), β[3] (age_sq) are all very different between bambi and pymc and I’m not sure how to reproduce the bambi result using pymc? Thanks for any insight anyone can provide!
Last updated: Tue Feb 22 2022
Python implementation: CPython
Python version : 3.9.10
IPython version : 8.0.1
pandas : 1.4.1
pymc : 4.0.0b2
scipy : 1.7.3
matplotlib: 3.5.1
theano : 1.1.2
aesara : 2.3.8
arviz : 0.11.4
numpy : 1.22.2
Watermark: 2.3.0
deer.csv (12.5 KB)