# Hierarchical Logistic Regression with Binomial Likelihood - The chain reached the maximum tree depth, The rhat statistic is larger than 1.05 for some parameters, The estimated number of effective samples is smaller than 200 for some parameters

I have a dataset with 32 diff. networks, only 1 Independent RV - EXP_FLAG_C and Target Variable - CONV_FLAG_ORIGINAL that looks like below:

Firs I build individual model for each NETWORK with pm.Binomial Likelihood and saving traces into dict. using the code below:

run = False
if run:

indiv_traces = {}

for network_name in network_names:

print (f"Sampling for NETWORK: {network_name}")
# Select subset of data belonging to network
n_data = data_compressed.loc[data_compressed.NETWORK == network_name]
n_data = n_data.reset_index(drop=True)
n_exp_flag = n_data.EXP_FLAG_C
n_conv_flag = n_data.nCONV_FLAG_ORIGINALs
n_trials = n_data.ntrials

#building a model for selected network
with pm.Model() as idividual_model:

# priors
alpha = pm.Normal('alpha', mu=0, sd=1)
beta = pm.Normal('beta', mu=0, sd=1)

m = alpha + pm.math.dot(n_exp_flag, beta)

# inverse link function with alias for the Theano function
p = pm.Deterministic('p', pm.math.sigmoid(m))

# boundary decision, which is the value used to separate class of the target
bd = pm.Deterministic('bd', -alpha/beta)

conv = pm.Binomial('conv',
p = p,
n = n_trials,
observed = n_conv_flag)

# posterior/create the trace
trace = pm.sample(chains = 4,
target_accept = 0.9,
return_inferencedata=True)

indiv_traces[network_name] = trace

Sampling goes smooth with no Divergency and beta coeff for each NETWORK are following :

Now I’m trying to build Hierarchical model and defining Hyperpriors for group nodes with code below:

with pm.Model() as hr_model_nc:

# Hyperpriors for group nodes
mu_a = pm.Normal('mu_alpha', mu=0, sigma=1)
sigma_a = pm.HalfCauchy('sigma_alpha', beta = 5)
mu_b = pm.Normal('mu_beta', mu=0, sigma=1)
sigma_b = pm.HalfCauchy('sigma_beta', beta = 5)

# priors
# Intercept for each network, distributed around group mean mu_a
alpha = pm.Normal('alpha',
mu=mu_a,
sd=sigma_a,
shape =len(data_compressed.NETWORK_IDX.unique()))

# Slope for each network, distributed around group mean mu_b
beta = pm.Normal('beta',
mu=mu_b,
sd=sigma_b,
shape =len(data_compressed.NETWORK_IDX.unique()))

m = alpha[network_idx] + pm.math.dot(data_compressed.EXP_FLAG_C.values, beta[network_idx])

# inverse link function with alias for the Theano function
p = pm.Deterministic('p', pm.math.sigmoid(m))

# boundary decision, which is the value used to separate class of the target
bd = pm.Deterministic('bd', -alpha[network_idx]/beta[network_idx])

conv = pm.Binomial('conv',
p=p,
n = data_compressed.ntrials.values,
observed=data_compressed.nCONV_FLAG_ORIGINALs.values)

# priors_hr = pm.sample_prior_predictive()

# posterior/create the trace
trace_hr_nc = pm.sample(chains= 2,
# tune=2000,
target_accept = 0.96,
max_treedepth = 15,
return_inferencedata=True)

As the result of sampling - 0 divergences but following messages:

Increasing tree depth only make longer time for sampling. I already increase target_accept.
My Hyperpriors is pretty informative. I don’t understand how should I reparametrize it.

Fores plot from Hierarchical Model with beta coeff for each NETWORK below:

If anybody can help with any thoughts or examples/ blogs/ articles would be much appreciate. Thank you

Without actually digging into your model, I might suggest reducing the target_accept and increasing the number of tuning steps (using the tune argument). Hitting the max tree depth suggests sub-optimal step sizes (set during tuning) which is going to be exacerbated by asking for very high acceptance rates. Your posterior could definitely be “difficult” and cause these problems on its own, but I would try the easy tweaks first.

Thank you @cluhmann for taking look into this.
Sorry for taking time to get back to you. I started run it on GPU system and had to Install Linux with Ubuntu, set conda and pymc3 env on it to preform better speed.
So my latest modification on this Hierarchical model based on your recommendation is reducing target_accept and increase tune to 8 000, also draws=10000:

trace_hr_nc = pm.sample(chains=2,
draws=10000,
tune=8000,
# max_treedepth = 15,
return_inferencedata=True)

As result - tons of divergences and following :