Hello,
I am trying to reproduce the model fitted in this paper https://arxiv.org/pdf/1710.03410.pdf i.e. fitting a hierarchical model on results from past experiments.
In essence, it is the same as the eight school problems.
I am specifically interested in the shape of the distribution of treatment effects and in particular its tails, and thus would like to use a student distribution rather than a normal distribution.
Reproducing the non-centered parametrization of the eight school case study and only adding nu and slightly modifying theta_tilde gives :
import numpy as np
import pymc3 as pm
import arviz as az
J = 8
y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])
with pm.Model() as NonCentered_eight:
mu = pm.Normal("mu", mu=0, sigma=5)
tau = pm.HalfCauchy("tau", beta=5)
nu = pm.Gamma("nu", 2,0.1)
theta_tilde = pm.StudentT("theta_t", mu=0, sigma=1,nu=nu, shape=J)
theta = pm.Deterministic("theta", mu + tau * theta_tilde)
obs = pm.Normal("obs", mu=theta, sigma=sigma, observed=y)
prior = pm.sample_prior_predictive(samples=10000)
trace = pm.sample(tune=15000,target_accept=0.99)
posterior_predictive = pm.sample_posterior_predictive(trace)
inf_data=az.from_pymc3(prior=prior,trace=trace,posterior_predictive=posterior_predictive)
model_summary=az.summary(trace)
model_summary
In the eight school problem, it works like a charm.
In my use case, it does not work.
Using priors matching close to those from the paper (p6) (I have also tried many other things on the prior, but it never really worked), what I have is :
#note : in my dataset, a 1% lift is coded as y=0.01 while in the paper it is coded as y=1
with pm.Model() as NonCentered_eight:
mu = pm.Normal("mu", mu=0, sigma=0.025)
tau = pm.HalfCauchy("tau", beta=1/100)
nu = pm.Uniform("nu", lower=1.01,upper=4)
theta_tilde = pm.StudentT("theta_t", mu=0, sigma=1,nu=nu, shape=df.shape[0])
theta = pm.Deterministic("theta", mu + tau * theta_tilde)
obs = pm.Normal("obs", mu=theta, sigma=df['lift_in_perc_standard_error_delta_method'].to_numpy(), observed=df['lift_in_perc_point_estimate'].to_numpy())
prior = pm.sample_prior_predictive(samples=10000)
trace = pm.sample(tune=60000,target_accept=0.99)
posterior_predictive = pm.sample_posterior_predictive(trace)
inf_data=az.from_pymc3(prior=prior,trace=trace,posterior_predictive=posterior_predictive)
model_summary=az.summary(trace)
The warning I have is “the estimated number of effective samples is smaller than 200 for some parameters” ; tau, mu and some thetas have a 1.05 > r_hat > 1.01.
I don’t know if it’s relevant, but :
a°) my number of experiments is high (around 1000)
b°) my posterior checks are good, it does really looks like my empirical distribution.
c°) tau and nu are correlated across samples
I was wondering if there was :
1°) something plain wrong already noticeable in what I am doing ?
2°) a known reparametrization / reformulation when you want to model random effects coming from student t distribution ?
3°) a good case study with a model where the distribution of random effects is not normal ?
I hope I haven’t missed something already posted somewhere.
In any case, I wish you a very good week-end.
Thanks a lot for your help,
Timothée