Hierarchical Logistic Regression with Binomial Likelihood - The chain reached the maximum tree depth, The rhat statistic is larger than 1.05 for some parameters, The estimated number of effective samples is smaller than 200 for some parameters

vanko · December 10, 2021, 9:24pm

I have a dataset with 32 diff. networks, only 1 Independent RV - EXP_FLAG_C and Target Variable - CONV_FLAG_ORIGINAL that looks like below:

Firs I build individual model for each NETWORK with pm.Binomial Likelihood and saving traces into dict. using the code below:

takes about .4hr

run = False
if run:

indiv_traces = {}

for network_name in network_names:

    print (f"Sampling for NETWORK: {network_name}")
    # Select subset of data belonging to network
    n_data = data_compressed.loc[data_compressed.NETWORK == network_name]
    n_data = n_data.reset_index(drop=True)
    n_exp_flag = n_data.EXP_FLAG_C
    n_conv_flag = n_data.nCONV_FLAG_ORIGINALs
    n_trials = n_data.ntrials

    #building a model for selected network
    with pm.Model() as idividual_model:

        # priors
        alpha = pm.Normal('alpha', mu=0, sd=1)
        beta = pm.Normal('beta', mu=0, sd=1)

        m = alpha + pm.math.dot(n_exp_flag, beta)

        # inverse link function with alias for the Theano function
        p = pm.Deterministic('p', pm.math.sigmoid(m)) 

        # boundary decision, which is the value used to separate class of the target
        bd = pm.Deterministic('bd', -alpha/beta)

        conv = pm.Binomial('conv',
                           p = p, 
                           n = n_trials,  
                           observed = n_conv_flag)
        
        # posterior/create the trace
        trace = pm.sample(chains = 4,
                          target_accept = 0.9, 
                          return_inferencedata=True)

    indiv_traces[network_name] = trace

Sampling goes smooth with no Divergency and beta coeff for each NETWORK are following :

Now I’m trying to build Hierarchical model and defining Hyperpriors for group nodes with code below:

with pm.Model() as hr_model_nc:

# Hyperpriors for group nodes
mu_a = pm.Normal('mu_alpha', mu=0, sigma=1)
sigma_a = pm.HalfCauchy('sigma_alpha', beta = 5)
mu_b = pm.Normal('mu_beta', mu=0, sigma=1)
sigma_b = pm.HalfCauchy('sigma_beta', beta = 5)

# priors
# Intercept for each network, distributed around group mean mu_a
alpha = pm.Normal('alpha', 
                  mu=mu_a, 
                  sd=sigma_a, 
                  shape =len(data_compressed.NETWORK_IDX.unique()))

# Slope for each network, distributed around group mean mu_b
beta = pm.Normal('beta', 
                 mu=mu_b, 
                 sd=sigma_b, 
                 shape =len(data_compressed.NETWORK_IDX.unique()))

m = alpha[network_idx] + pm.math.dot(data_compressed.EXP_FLAG_C.values, beta[network_idx])

# inverse link function with alias for the Theano function
p = pm.Deterministic('p', pm.math.sigmoid(m)) 

# boundary decision, which is the value used to separate class of the target
bd = pm.Deterministic('bd', -alpha[network_idx]/beta[network_idx])

conv = pm.Binomial('conv', 
                   p=p, 
                   n = data_compressed.ntrials.values,
                   observed=data_compressed.nCONV_FLAG_ORIGINALs.values)

# priors_hr = pm.sample_prior_predictive()

# posterior/create the trace
trace_hr_nc = pm.sample(chains= 2, 
                        # tune=2000,
                        target_accept = 0.96, 
                        max_treedepth = 15, 
                        return_inferencedata=True)

As the result of sampling - 0 divergences but following messages:

Increasing tree depth only make longer time for sampling. I already increase target_accept.
My Hyperpriors is pretty informative. I don’t understand how should I reparametrize it.

Fores plot from Hierarchical Model with beta coeff for each NETWORK below:

If anybody can help with any thoughts or examples/ blogs/ articles would be much appreciate. Thank you

cluhmann · December 11, 2021, 4:03am

Without actually digging into your model, I might suggest reducing the target_accept and increasing the number of tuning steps (using the tune argument). Hitting the max tree depth suggests sub-optimal step sizes (set during tuning) which is going to be exacerbated by asking for very high acceptance rates. Your posterior could definitely be “difficult” and cause these problems on its own, but I would try the easy tweaks first.

vanko · December 16, 2021, 6:45pm

Thank you @cluhmann for taking look into this.
Sorry for taking time to get back to you. I started run it on GPU system and had to Install Linux with Ubuntu, set conda and pymc3 env on it to preform better speed.
So my latest modification on this Hierarchical model based on your recommendation is reducing target_accept and increase tune to 8 000, also draws=10000:

trace_hr_nc = pm.sample(chains=2,
draws=10000,
tune=8000,
# max_treedepth = 15,
return_inferencedata=True)

As result - tons of divergences and following :

Topic		Replies	Views
Hierarchical Logistic Regression. How to reparameterize? Questions	1	446	October 26, 2021
Hierarchical Binomial Model - Not converging Questions	2	888	February 1, 2019
Sampler gets stuck in hierarchical logistic regression modeling	4	78	April 30, 2025
Hierarchical Linear Regression Model -- why is it not converging? Questions	3	1563	March 14, 2018
Sampling errors when running hierarchical model using hyperpriors Questions	0	303	October 26, 2021

Hierarchical Logistic Regression with Binomial Likelihood - The chain reached the maximum tree depth, The rhat statistic is larger than 1.05 for some parameters, The estimated number of effective samples is smaller than 200 for some parameters

takes about .4hr

Related topics