Anova - MathAch dataset of 160 schools with math achievements of students

alex7and7er · November 17, 2023, 9:42pm

Hello! I am new to bayesian statistics and currently following a book called “Bayesian Analysis for the
Social Sciences”. In chapter 7 a hierarchical model is used in the example 7.6 to conduct one-way ANOVA. In the book the following code is presented

I’ve built the same model (I guess) using pymc:

school_idx,schools=pd.factorize(df['School'])
coords = {"school": schools,"obs_id": np.arange(len(df))}
model=pm.Model(coords=coords)
with model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    sigmas=pm.Uniform('sigmas',0,10,shape=2)
    tau=pm.Deterministic('tau',sigmas**(-2))
    mu=pm.Normal('mu',mu=0,sigma=0.0001**0.5)
    alphas=pm.Normal('alphas',mu=mu,sigma=tau[1]**0.5,dims='school')
    math=pm.Normal('math',mu=alphas[school],sigma=tau[0]**0.5,observed=df['MathAch2'],dims='obs_id')

However, the book suggests the following results:
mu=12.64
omega=2.96 (‘between’ standard deviation)
sigma=6.23 (‘within’ standard deviation)

I got this from az.summary(idata):
mu=0.001
sigmas[0]=0.160
sigmas[1]=0.077

I suppose I do not understand why mu is so low, it should be much higher. As alphas are drawn from the distribution where mu is the mean. But no alpha is lower than zero, the are approximately in the interval (4,20)

I must have made a mistake somewhere. But I cannot see where exactly. Please help!

cluhmann · November 17, 2023, 11:37pm

Welcome!

Do you have the data handy?

Simon · November 18, 2023, 12:36am

Maybe the sigma parameter of your mu is causing the issue? A 0.01 SD parameter for a Gaussian is quite narrow, so maybe a standard Gaussian provides better coverage (always better to conduct prior predictive checks). Just by changing that parameter you can recover the value of the central tendency of simulated data:

import numpy as np
import arviz as az
import pymc as pm

np.random.seed(27)

M = np.random.normal(12, 3, 160*32) #mean 12 and SD 3

school_idx = np.array([np.repeat(i, 32) for i in range(160)]).flatten()

schools = np.arange(160).astype("str")

coords = {"school": schools,"obs_id":np.arange(len(M))}

with pm.Model(coords=coords) as model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    sigmas = pm.Uniform('sigmas', 0, 10, shape=2)
    tau = pm.Deterministic('tau', sigmas**(-2))
    mu = pm.Normal('mu', mu=0, sigma=1)
    alphas = pm.Normal('alphas', mu=mu, sigma=tau[1]**0.5, dims='school')
    math = pm.Normal('math', mu=alphas[school], sigma=tau[0]**0.5, observed=M, dims='obs_id')
    
with model:
    idata = pm.sample(nuts_sampler="numpyro", random_seed=27)
    
az.summary(idata, var_names=['mu','sigmas','tau'])

Running chain 0: 100%|██████████| 2000/2000 [00:06<00:00, 325.59it/s]
Running chain 1: 100%|██████████| 2000/2000 [00:06<00:00, 325.70it/s]
Running chain 2: 100%|██████████| 2000/2000 [00:06<00:00, 325.86it/s]
Running chain 3: 100%|██████████| 2000/2000 [00:06<00:00, 326.02it/s]
Sampling time =  0:00:06.318887
Transforming variables...
Transformation time =  0:00:00.116303
Out[31]: 
             mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
mu         11.997  0.045  11.916   12.079  ...    0.002     389.0     623.0   1.01
sigmas[0]   0.335  0.003   0.328    0.341  ...    0.000    5458.0    3010.0   1.00
sigmas[1]   7.631  1.544   5.048    9.997  ...    0.082     184.0     318.0   1.03
tau[0]      8.897  0.177   8.556    9.227  ...    0.002    5458.0    3010.0   1.00
tau[1]      0.020  0.010   0.010    0.039  ...    0.001     184.0     318.0   1.03

[5 rows x 9 columns]

May be also possible to remove the sigmas and assign tau directly as a HalfNormal? That may help to better recover the data total SD (if that’s desired):

with pm.Model(coords=coords) as model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    tau = pm.HalfNormal('tau', 1, shape=2) 
    mu = pm.Normal('mu', mu=0, sigma=1)
    alphas = pm.Normal('alphas', mu=mu, sigma=tau[1], dims='school')
    math = pm.Normal('math', mu=alphas[school], sigma=tau[0], observed=M, dims='obs_id')
    
with model:
    idata = pm.sample(nuts_sampler="numpyro", random_seed=27)
    
az.summary(idata, var_names=['mu','tau'])

Running chain 0: 100%|██████████| 2000/2000 [00:07<00:00, 277.07it/s]
Running chain 1: 100%|██████████| 2000/2000 [00:07<00:00, 277.15it/s]
Running chain 2: 100%|██████████| 2000/2000 [00:07<00:00, 277.26it/s]
Running chain 3: 100%|██████████| 2000/2000 [00:07<00:00, 277.42it/s]
Sampling time =  0:00:07.397331
Transforming variables...
Transformation time =  0:00:00.078383
Out[34]: 
          mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
mu      11.999  0.047  11.908   12.085  ...    0.003     134.0     401.0   1.04
tau[0]   2.982  0.030   2.922    3.035  ...    0.001    1507.0    1982.0   1.00
tau[1]   0.115  0.059   0.025    0.216  ...    0.012      12.0      22.0   1.30

[3 rows x 9 columns]

Hope this makes sense and helps a bit.

jessegrabowski · November 18, 2023, 8:54am

I think Simon is right that the problem is the variance parameter. iirc BUGS/JAGS always parameterizes Normal distributions with the precision (tau), rather than the standard deviation (sigma). This is also available in PyMC. You can do this instead:

    alphas=pm.Normal('alphas',mu=mu, tau=tau[1]**0.5,dims='school')

tau and sigma are linked by tau = 1/sigma, so if you pass tau to the sigma argument you’ll end up with ultra-informative priors.

alex7and7er · November 18, 2023, 9:40am

Here is the data
MathAchieve.csv (271.2 KB)

alex7and7er · November 18, 2023, 9:46am

Thank you very much, I’ll try out with a standard Gaussian then. But it’s the book that offers such narrow sigma (now I understand it’s actually tau) parameter. I guess I was wrong when I thought it was sigma.

alex7and7er · November 18, 2023, 10:35am

Thank you for explaining the JAGS specifics. Now I get very close results to the book. However, I have one question. How can an \alpha_i\sim N(\mu,1/\tau_1) be in the interval (4,20), when \mu=0, and 1/\tau_1 be very small. It doesn’t make sense to me. If the details are too much to be listed in one post, could you please suggest some reading material?

cluhmann · November 18, 2023, 1:49pm

I think what is happening is that your specification basically implied that mu is definitely, nearly 100% guaranteed to be very close zero. But how closely each alpha was to mu wasn’t nearly as certain (i.e., tau[1] was allowed to vary). So your posterior still indicates that mu is basically zero (because that’s what your prior implied) and the data pulled each alpha up toward the empirical mean(s) of each respective school’s df['MathAch']. But I just skimmed through quickly, so I may have missed something.

alex7and7er · November 18, 2023, 2:29pm

Thank you, I guess, you have a point. The community team here is wonderful:)

Topic		Replies	Views
Multilevel model specification Questions	8	1045	January 2, 2020
First steps with PyMC and BMA	4	352	November 8, 2023
Setting sigma to 0.005 version agnostic modeling	2	44	July 13, 2024
Hierarchical logistic regression giving non-sensible results? Do I formulate it correctly? Questions	15	853	August 4, 2020
The standard deviation of posterior distribution remains the same compared to prior v5 modeling	16	727	January 21, 2024

Anova - MathAch dataset of 160 schools with math achievements of students

Related topics