Hierarchical Logistic Regression with Binomial Likelihood - The chain reached the maximum tree depth, The rhat statistic is larger than 1.05 for some parameters, The estimated number of effective samples is smaller than 200 for some parameters

I have a dataset with 32 diff. networks, only 1 Independent RV - EXP_FLAG_C and Target Variable - CONV_FLAG_ORIGINAL that looks like below:
image

Firs I build individual model for each NETWORK with pm.Binomial Likelihood and saving traces into dict. using the code below:

takes about .4hr

run = False
if run:

indiv_traces = {}

for network_name in network_names:

    print (f"Sampling for NETWORK: {network_name}")
    # Select subset of data belonging to network
    n_data = data_compressed.loc[data_compressed.NETWORK == network_name]
    n_data = n_data.reset_index(drop=True)
    n_exp_flag = n_data.EXP_FLAG_C
    n_conv_flag = n_data.nCONV_FLAG_ORIGINALs
    n_trials = n_data.ntrials

    #building a model for selected network
    with pm.Model() as idividual_model:

        # priors
        alpha = pm.Normal('alpha', mu=0, sd=1)
        beta = pm.Normal('beta', mu=0, sd=1)

        m = alpha + pm.math.dot(n_exp_flag, beta)

        # inverse link function with alias for the Theano function
        p = pm.Deterministic('p', pm.math.sigmoid(m)) 

        # boundary decision, which is the value used to separate class of the target
        bd = pm.Deterministic('bd', -alpha/beta)

        conv = pm.Binomial('conv',
                           p = p, 
                           n = n_trials,  
                           observed = n_conv_flag)
        
        # posterior/create the trace
        trace = pm.sample(chains = 4,
                          target_accept = 0.9, 
                          return_inferencedata=True)

    indiv_traces[network_name] = trace

Sampling goes smooth with no Divergency and beta coeff for each NETWORK are following :

Now I’m trying to build Hierarchical model and defining Hyperpriors for group nodes with code below:

with pm.Model() as hr_model_nc:

# Hyperpriors for group nodes
mu_a = pm.Normal('mu_alpha', mu=0, sigma=1)
sigma_a = pm.HalfCauchy('sigma_alpha', beta = 5)
mu_b = pm.Normal('mu_beta', mu=0, sigma=1)
sigma_b = pm.HalfCauchy('sigma_beta', beta = 5)

# priors
# Intercept for each network, distributed around group mean mu_a
alpha = pm.Normal('alpha', 
                  mu=mu_a, 
                  sd=sigma_a, 
                  shape =len(data_compressed.NETWORK_IDX.unique()))

# Slope for each network, distributed around group mean mu_b
beta = pm.Normal('beta', 
                 mu=mu_b, 
                 sd=sigma_b, 
                 shape =len(data_compressed.NETWORK_IDX.unique()))

m = alpha[network_idx] + pm.math.dot(data_compressed.EXP_FLAG_C.values, beta[network_idx])

# inverse link function with alias for the Theano function
p = pm.Deterministic('p', pm.math.sigmoid(m)) 

# boundary decision, which is the value used to separate class of the target
bd = pm.Deterministic('bd', -alpha[network_idx]/beta[network_idx])

conv = pm.Binomial('conv', 
                   p=p, 
                   n = data_compressed.ntrials.values,
                   observed=data_compressed.nCONV_FLAG_ORIGINALs.values)

# priors_hr = pm.sample_prior_predictive()

# posterior/create the trace
trace_hr_nc = pm.sample(chains= 2, 
                        # tune=2000,
                        target_accept = 0.96, 
                        max_treedepth = 15, 
                        return_inferencedata=True)

As the result of sampling - 0 divergences but following messages:

Increasing tree depth only make longer time for sampling. I already increase target_accept.
My Hyperpriors is pretty informative. I don’t understand how should I reparametrize it.

Fores plot from Hierarchical Model with beta coeff for each NETWORK below:
image

If anybody can help with any thoughts or examples/ blogs/ articles would be much appreciate. Thank you

Without actually digging into your model, I might suggest reducing the target_accept and increasing the number of tuning steps (using the tune argument). Hitting the max tree depth suggests sub-optimal step sizes (set during tuning) which is going to be exacerbated by asking for very high acceptance rates. Your posterior could definitely be “difficult” and cause these problems on its own, but I would try the easy tweaks first.

Thank you @cluhmann for taking look into this.
Sorry for taking time to get back to you. I started run it on GPU system and had to Install Linux with Ubuntu, set conda and pymc3 env on it to preform better speed.
So my latest modification on this Hierarchical model based on your recommendation is reducing target_accept and increase tune to 8 000, also draws=10000:

trace_hr_nc = pm.sample(chains=2,
draws=10000,
tune=8000,
# max_treedepth = 15,
return_inferencedata=True)

As result - tons of divergences and following :