Eight school problem with student t distribution for treatment effects


I am trying to reproduce the model fitted in this paper https://arxiv.org/pdf/1710.03410.pdf i.e. fitting a hierarchical model on results from past experiments.

In essence, it is the same as the eight school problems.

I am specifically interested in the shape of the distribution of treatment effects and in particular its tails, and thus would like to use a student distribution rather than a normal distribution.

Reproducing the non-centered parametrization of the eight school case study and only adding nu and slightly modifying theta_tilde gives :

import numpy as np
import pymc3 as pm
import arviz as az

J = 8
y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

with pm.Model() as NonCentered_eight:
    mu = pm.Normal("mu", mu=0, sigma=5)
    tau = pm.HalfCauchy("tau", beta=5)
    nu = pm.Gamma("nu", 2,0.1)
    theta_tilde = pm.StudentT("theta_t", mu=0, sigma=1,nu=nu, shape=J)
    theta = pm.Deterministic("theta", mu + tau * theta_tilde)
    obs = pm.Normal("obs", mu=theta, sigma=sigma, observed=y)
    prior = pm.sample_prior_predictive(samples=10000)
    trace = pm.sample(tune=15000,target_accept=0.99)
    posterior_predictive = pm.sample_posterior_predictive(trace)


In the eight school problem, it works like a charm.

In my use case, it does not work.

Using priors matching close to those from the paper (p6) (I have also tried many other things on the prior, but it never really worked), what I have is :

#note : in my dataset, a 1% lift is coded as y=0.01 while in the paper it is coded as y=1
with pm.Model() as NonCentered_eight:
    mu = pm.Normal("mu", mu=0, sigma=0.025)
    tau = pm.HalfCauchy("tau", beta=1/100)
    nu = pm.Uniform("nu", lower=1.01,upper=4)
    theta_tilde = pm.StudentT("theta_t", mu=0, sigma=1,nu=nu, shape=df.shape[0])
    theta = pm.Deterministic("theta", mu + tau * theta_tilde)
    obs = pm.Normal("obs", mu=theta, sigma=df['lift_in_perc_standard_error_delta_method'].to_numpy(), observed=df['lift_in_perc_point_estimate'].to_numpy())
    prior = pm.sample_prior_predictive(samples=10000)
    trace = pm.sample(tune=60000,target_accept=0.99)
    posterior_predictive = pm.sample_posterior_predictive(trace)   

The warning I have is “the estimated number of effective samples is smaller than 200 for some parameters” ; tau, mu and some thetas have a 1.05 > r_hat > 1.01.

I don’t know if it’s relevant, but :
a°) my number of experiments is high (around 1000)
b°) my posterior checks are good, it does really looks like my empirical distribution.
c°) tau and nu are correlated across samples

I was wondering if there was :
1°) something plain wrong already noticeable in what I am doing ?
2°) a known reparametrization / reformulation when you want to model random effects coming from student t distribution ?
3°) a good case study with a model where the distribution of random effects is not normal ?

I hope I haven’t missed something already posted somewhere.

In any case, I wish you a very good week-end.

Thanks a lot for your help,


1 Like


This is expected. Requiring thin tails (large nu) forces the scale to be increased (tau to be decreased) in order to accommodate outlying observations.

As for the poor sampling, it could be many things. It could be the amount of data you have per level/category in the hierarchy (e.g., per school), it could be something about the distribution of data with a level category, it could be the distribution of means/SDs across categories, etc. Have you tried reducing target_accept? That may improve mixing.

As for hierarchical models that use non-normal “random effects” here is one example.