Divergences in a non-centered implementation of the normal/normal hierarchal model

dpananos · July 23, 2022, 9:25pm

I’m implementing a version of the 8 schools model in pymc. The model code and data is available here:

import numpy as np
import pymc as pm


N = 50_000
treatment_conversions = np.array([541, 557, 559, 556, 530, 532, 516, 532, 528, 544, 519, 552])
control_conversions = np.array([496, 524, 486, 500, 516, 475, 507, 475, 490, 506, 512, 489])
log_estimated_rr = np.log(treatment_conversions / control_conversions)
sd_log_estimated_rr = 1/treatment_conversions + 1/control_conversions - 2/N
experiment = ["Experiment_{i}" for i in range(treatment_conversions.size)]

with pm.Model(coords={'experiment':experiment}) as model:
    
    mu = pm.Normal('mu',mu=0, sigma=1)
    tau = pm.HalfCauchy('tau', 1)
    z = pm.Normal('z',0, 1, dims='experiment')
    log_RR = pm.Normal('log_RR',  mu + tau * z, sd_log_estimated_rr, observed=log_estimated_rr)
    
with model:
    trace = pm.sample(cores=4, chains=4)

This code is taken from the pymc documentation. When I run the model, I’m getting divergences (which is strange because I’ve implemented this with a non-centered parameterization). Indeed, the author(s) of the docs have probably experienced the same thing as they have set their target_accept to 0.9.

This seems odd. The same model written in stan does not experience any divergences and runs in under a second where as the pymc model runs in ~30 seconds.

import cmdstanpy


model_data= {
    'n':12,
    'n_experiment':12,
    'log_estimated_rr': log_estimated_rr.tolist(),
    'sd_log_estimated_rr': sd_log_estimated_rr
}

model_code = '''
data{
  // For model fitting 
  int n; //number of observations in data
  vector[n] log_estimated_rr; //log relative lift estimated from each experiment
  vector[n] sd_log_estimated_rr; //standard deviation of teh log relative lift estimated from experiemnt
}
parameters{
  real mu; 
  real<lower=0> tau;
  vector[n] z;
}
transformed parameters{
  vector[n] true_log_rr = mu + z * tau;
}
model{
  mu ~ std_normal();
  tau~ cauchy(0, 1);
  z ~ std_normal();
  log_estimated_rr ~ normal(true_log_rr, sd_log_estimated_rr);
}

'''

with open('model.stan', 'w') as file:
    file.write(model_code)
    
model = cmdstanpy.CmdStanModel(stan_file='model.stan')

fit = model.sample(model_data)

Is there something I’ve missed? Why might the pymc model be so much slower and lead to divergences, especially when using the suggested non-centered parameterization? An option to rectify the divergences would be to increase target_accept or use slightly more informative priors (perhaps in tau).

cluhmann · July 23, 2022, 11:46pm

Non-centered parameterizations are not expected to be universally superior to their centered counterparts (e.g., here).

It is, in general, very difficult to compare performance across PPLs. The tuning and sampling algorithms are different and trying to equate them is very challenging… In this particular scenario, ditching the non-centered distribution samples in 6 seconds for me.

dpananos · July 24, 2022, 3:18am

That is fantastically interesting. Thanks +1

Topic		Replies	Views
Convergence with noncentered hierarchical model Questions	6	854	July 24, 2021
Slow speed of NUTS in Hierarchical model Questions	14	2185	October 20, 2020
Overcoming divergence in MvNormal covariance model? Questions	1	488	January 28, 2019
Hierarchical Binomial Model - Not converging Questions	2	914	February 1, 2019
Are there instances when reparameterizing a model makes it slower? Questions	4	679	December 29, 2020

Divergences in a non-centered implementation of the normal/normal hierarchal model

Related topics