Anova - MathAch dataset of 160 schools with math achievements of students

Hello! I am new to bayesian statistics and currently following a book called “Bayesian Analysis for the
Social Sciences”. In chapter 7 a hierarchical model is used in the example 7.6 to conduct one-way ANOVA. In the book the following code is presented


I’ve built the same model (I guess) using pymc:

school_idx,schools=pd.factorize(df['School'])
coords = {"school": schools,"obs_id": np.arange(len(df))}
model=pm.Model(coords=coords)
with model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    sigmas=pm.Uniform('sigmas',0,10,shape=2)
    tau=pm.Deterministic('tau',sigmas**(-2))
    mu=pm.Normal('mu',mu=0,sigma=0.0001**0.5)
    alphas=pm.Normal('alphas',mu=mu,sigma=tau[1]**0.5,dims='school')
    math=pm.Normal('math',mu=alphas[school],sigma=tau[0]**0.5,observed=df['MathAch2'],dims='obs_id')

However, the book suggests the following results:
mu=12.64
omega=2.96 (‘between’ standard deviation)
sigma=6.23 (‘within’ standard deviation)

I got this from az.summary(idata):
mu=0.001
sigmas[0]=0.160
sigmas[1]=0.077

I suppose I do not understand why mu is so low, it should be much higher. As alphas are drawn from the distribution where mu is the mean. But no alpha is lower than zero, the are approximately in the interval (4,20)

I must have made a mistake somewhere. But I cannot see where exactly. Please help!

Welcome!

Do you have the data handy?

Maybe the sigma parameter of your mu is causing the issue? A 0.01 SD parameter for a Gaussian is quite narrow, so maybe a standard Gaussian provides better coverage (always better to conduct prior predictive checks). Just by changing that parameter you can recover the value of the central tendency of simulated data:

import numpy as np
import arviz as az
import pymc as pm

np.random.seed(27)

M = np.random.normal(12, 3, 160*32) #mean 12 and SD 3

school_idx = np.array([np.repeat(i, 32) for i in range(160)]).flatten()

schools = np.arange(160).astype("str")

coords = {"school": schools,"obs_id":np.arange(len(M))}

with pm.Model(coords=coords) as model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    sigmas = pm.Uniform('sigmas', 0, 10, shape=2)
    tau = pm.Deterministic('tau', sigmas**(-2))
    mu = pm.Normal('mu', mu=0, sigma=1)
    alphas = pm.Normal('alphas', mu=mu, sigma=tau[1]**0.5, dims='school')
    math = pm.Normal('math', mu=alphas[school], sigma=tau[0]**0.5, observed=M, dims='obs_id')
    
with model:
    idata = pm.sample(nuts_sampler="numpyro", random_seed=27)
    
az.summary(idata, var_names=['mu','sigmas','tau'])

Running chain 0: 100%|██████████| 2000/2000 [00:06<00:00, 325.59it/s]
Running chain 1: 100%|██████████| 2000/2000 [00:06<00:00, 325.70it/s]
Running chain 2: 100%|██████████| 2000/2000 [00:06<00:00, 325.86it/s]
Running chain 3: 100%|██████████| 2000/2000 [00:06<00:00, 326.02it/s]
Sampling time =  0:00:06.318887
Transforming variables...
Transformation time =  0:00:00.116303
Out[31]: 
             mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
mu         11.997  0.045  11.916   12.079  ...    0.002     389.0     623.0   1.01
sigmas[0]   0.335  0.003   0.328    0.341  ...    0.000    5458.0    3010.0   1.00
sigmas[1]   7.631  1.544   5.048    9.997  ...    0.082     184.0     318.0   1.03
tau[0]      8.897  0.177   8.556    9.227  ...    0.002    5458.0    3010.0   1.00
tau[1]      0.020  0.010   0.010    0.039  ...    0.001     184.0     318.0   1.03

[5 rows x 9 columns]

May be also possible to remove the sigmas and assign tau directly as a HalfNormal? That may help to better recover the data total SD (if that’s desired):

with pm.Model(coords=coords) as model:
    school = pm.ConstantData("shool_idx", school_idx, dims="obs_id")
    tau = pm.HalfNormal('tau', 1, shape=2) 
    mu = pm.Normal('mu', mu=0, sigma=1)
    alphas = pm.Normal('alphas', mu=mu, sigma=tau[1], dims='school')
    math = pm.Normal('math', mu=alphas[school], sigma=tau[0], observed=M, dims='obs_id')
    
with model:
    idata = pm.sample(nuts_sampler="numpyro", random_seed=27)
    
az.summary(idata, var_names=['mu','tau'])

Running chain 0: 100%|██████████| 2000/2000 [00:07<00:00, 277.07it/s]
Running chain 1: 100%|██████████| 2000/2000 [00:07<00:00, 277.15it/s]
Running chain 2: 100%|██████████| 2000/2000 [00:07<00:00, 277.26it/s]
Running chain 3: 100%|██████████| 2000/2000 [00:07<00:00, 277.42it/s]
Sampling time =  0:00:07.397331
Transforming variables...
Transformation time =  0:00:00.078383
Out[34]: 
          mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
mu      11.999  0.047  11.908   12.085  ...    0.003     134.0     401.0   1.04
tau[0]   2.982  0.030   2.922    3.035  ...    0.001    1507.0    1982.0   1.00
tau[1]   0.115  0.059   0.025    0.216  ...    0.012      12.0      22.0   1.30

[3 rows x 9 columns]

Hope this makes sense and helps a bit.

3 Likes

I think Simon is right that the problem is the variance parameter. iirc BUGS/JAGS always parameterizes Normal distributions with the precision (tau), rather than the standard deviation (sigma). This is also available in PyMC. You can do this instead:

    alphas=pm.Normal('alphas',mu=mu, tau=tau[1]**0.5,dims='school')

tau and sigma are linked by tau = 1/sigma, so if you pass tau to the sigma argument you’ll end up with ultra-informative priors.

1 Like

Here is the data
MathAchieve.csv (271.2 KB)

Thank you very much, I’ll try out with a standard Gaussian then. But it’s the book that offers such narrow sigma (now I understand it’s actually tau) parameter. I guess I was wrong when I thought it was sigma.

Thank you for explaining the JAGS specifics. Now I get very close results to the book. However, I have one question. How can an \alpha_i\sim N(\mu,1/\tau_1) be in the interval (4,20), when \mu=0, and 1/\tau_1 be very small. It doesn’t make sense to me. If the details are too much to be listed in one post, could you please suggest some reading material?

I think what is happening is that your specification basically implied that mu is definitely, nearly 100% guaranteed to be very close zero. But how closely each alpha was to mu wasn’t nearly as certain (i.e., tau[1] was allowed to vary). So your posterior still indicates that mu is basically zero (because that’s what your prior implied) and the data pulled each alpha up toward the empirical mean(s) of each respective school’s df['MathAch']. But I just skimmed through quickly, so I may have missed something.

1 Like

Thank you, I guess, you have a point. The community team here is wonderful:)

2 Likes