This is a very interesting case. Thank you for taking the time to write about this.
I’m not entirely sure yet what is going on here, but this is what I’m currently thinking about.
First, to get people an idea what this looks like:
So far as I can tell the problem is that the posterior really is quite complicated. There isn’t enough information in the dataset to get tight estimates for some of the scale parameters, and not even for the global intercept. This leads to long tails, correlations and some funnel-like behaviour in the posterior. The long tails are what you can see as “spikes” in the trace, funnels and correlation are visible in scatter plots between different variables.
I experimented a bit with different parametrizations, and this seems to work (it doesn’t give me divergences on master). I also renamed your variables to my own convention, that made it easier to work with. I hope it’s not too confusing.
with pm.Model() as hierarchical_model:
intercept = pm.Normal('intercept', mu=0., sd=5)
subject_sd_raw = pm.Uniform('subject_sd_raw', upper=np.pi/2)
subject_sd = 0.2 * tt.tan(subject_sd_raw)
subject = pm.Normal('subject', mu=0, sd=1, shape=n_subjects)
altered_mu = pm.Normal('altered_mu', mu=0, sd=1)
altered_subject_sd_raw = pm.Uniform('altered_subject_sd_raw', upper=np.pi/2)
altered_subject_sd = tt.tan(altered_subject_sd_raw)
altered_subject = pm.Normal('altered_subject', mu=0, sd=1, shape=n_subjects)
is_altered = data.is_altered.values
estimator = (intercept
+ subject_sd * subject[subject_idx]
+ altered_mu * is_altered
+ altered_subject_sd * altered_subject[subject_idx] * is_altered)
residual = pm.HalfCauchy('residual', 5)
nu = pm.Gamma('nu', 2, 0.1)
likelihood = pm.StudentT('y', nu=nu, mu=estimator,
sd=residual, observed=data['measurement1'])
I used a little trick to reparameterize subject_sd, the tan(uniform(0, pi/2)) is half-cauchy distributed.
I sample with
with hierarchical_model:
trace = pm.sample(2000, tune=1000, njobs=2, nuts_kwargs={'target_accept': 0.95})
This is still far from perfect, eg look at that scatter plot:
plt.scatter(trace['altered_mu'], trace['altered_subject_sd_raw_interval__'], marker='.')
Model fit
Part of the problem here might also be, that maybe the model doesn’t fit the data all too well. Does it really make sense to model the change in the measurement between altered and not altered as a normal? Just from looking at the data it looks like maybe it would make sense to say that the alteration leaves a lot of subjects mostly unchanged, but affects some by a larger amount? If we were to assume that only few subjects are really affected, than we could use a scale-mixture prior like the horseshoe for the effect sizes:
with pm.Model() as hierarchical_model:
intercept = pm.Normal('intercept', mu=2.5, sd=1)
#subject_sd = pm.HalfNormal('subject_sd', sd=5)
subject_sd_raw = pm.Uniform('subject_sd_raw', upper=np.pi/2)
subject_sd = tt.tan(subject_sd_raw)
subject = pm.Normal('subject', mu=0, sd=1, shape=n_subjects)
altered_subject_tau = pm.HalfNormal(
'altered_subject_tau', sd=0.2)
#altered_subject_theta = pm.HalfStudentT(
# 'altered_subject_theta', nu=3, sd=1, shape=n_subjects)
altered_subject_theta_raw = pm.Uniform(
'altered_subject_theta_raw', upper=np.pi/2, shape=n_subjects)
altered_subject_theta = tt.tan(altered_subject_theta_raw)
altered_subject_eta = pm.Normal(
'altered_subject_eta', mu=0, sd=1, shape=n_subjects)
altered_subject = (altered_subject_tau
* altered_subject_theta
* altered_subject_eta)
altered_subject = pm.Deterministic('altered_subject', altered_subject)
is_altered = data.is_altered.values
estimator = (intercept
+ subject_sd * subject[subject_idx]
+ altered_subject[subject_idx] * is_altered)
residual = pm.HalfCauchy('residual', 5)
likelihood = pm.StudentT('y', nu=7, mu=estimator, sd=residual, observed=data['measurement1'])
This leads to effect size estimates like this:
I guess it is pretty hard to say something meaningful about the distribution of subject_altered with just 4 samples. Some domain specific knowledge might help though. 
PyMC3 versions
About the changes between the pymc3 versions: At the moment I’m pretty much at a loss as to why it would seem to work with 3.0. My best guess so far is that it actually doesn’t, there just weren’t the diagnostics around back then to tell you. But really, I’ll have look into that.
Model checking
You should also compare the dataset with predictive posterior samples, and use that to check the likelihood. The pick nu=7 was pretty much blind, and might not make much sense. From the plot, maybe nu should be a bit smaller?
All together this looks like a perfect example for Riemannian Hamiltonian methods (small number of parameters, really complicated posterior). If we ever get that to work properly, is the dataset public? It might make a nice case study.


