Divergences in a non-centered implementation of the normal/normal hierarchal model

I’m implementing a version of the 8 schools model in pymc. The model code and data is available here:

import numpy as np
import pymc as pm


N = 50_000
treatment_conversions = np.array([541, 557, 559, 556, 530, 532, 516, 532, 528, 544, 519, 552])
control_conversions = np.array([496, 524, 486, 500, 516, 475, 507, 475, 490, 506, 512, 489])
log_estimated_rr = np.log(treatment_conversions / control_conversions)
sd_log_estimated_rr = 1/treatment_conversions + 1/control_conversions - 2/N
experiment = ["Experiment_{i}" for i in range(treatment_conversions.size)]

with pm.Model(coords={'experiment':experiment}) as model:
    
    mu = pm.Normal('mu',mu=0, sigma=1)
    tau = pm.HalfCauchy('tau', 1)
    z = pm.Normal('z',0, 1, dims='experiment')
    log_RR = pm.Normal('log_RR',  mu + tau * z, sd_log_estimated_rr, observed=log_estimated_rr)
    
with model:
    trace = pm.sample(cores=4, chains=4)

This code is taken from the pymc documentation. When I run the model, I’m getting divergences (which is strange because I’ve implemented this with a non-centered parameterization). Indeed, the author(s) of the docs have probably experienced the same thing as they have set their target_accept to 0.9.

This seems odd. The same model written in stan does not experience any divergences and runs in under a second where as the pymc model runs in ~30 seconds.

import cmdstanpy


model_data= {
    'n':12,
    'n_experiment':12,
    'log_estimated_rr': log_estimated_rr.tolist(),
    'sd_log_estimated_rr': sd_log_estimated_rr
}

model_code = '''
data{
  // For model fitting 
  int n; //number of observations in data
  vector[n] log_estimated_rr; //log relative lift estimated from each experiment
  vector[n] sd_log_estimated_rr; //standard deviation of teh log relative lift estimated from experiemnt
}
parameters{
  real mu; 
  real<lower=0> tau;
  vector[n] z;
}
transformed parameters{
  vector[n] true_log_rr = mu + z * tau;
}
model{
  mu ~ std_normal();
  tau~ cauchy(0, 1);
  z ~ std_normal();
  log_estimated_rr ~ normal(true_log_rr, sd_log_estimated_rr);
}

'''

with open('model.stan', 'w') as file:
    file.write(model_code)
    
model = cmdstanpy.CmdStanModel(stan_file='model.stan')

fit = model.sample(model_data)

Is there something I’ve missed? Why might the pymc model be so much slower and lead to divergences, especially when using the suggested non-centered parameterization? An option to rectify the divergences would be to increase target_accept or use slightly more informative priors (perhaps in tau).

1 Like

Non-centered parameterizations are not expected to be universally superior to their centered counterparts (e.g., here).

It is, in general, very difficult to compare performance across PPLs. The tuning and sampling algorithms are different and trying to equate them is very challenging… In this particular scenario, ditching the non-centered distribution samples in 6 seconds for me.

That is fantastically interesting. Thanks +1

1 Like