Hierarchical Binomial regression - ineffective sampling

DrZ · January 16, 2018, 6:06pm

Hi all,

I am trying to write a model on a hierarchical binomial regression model and the convergence is quite slow even when working on simulated data.
Here is a snippet with a reproducible example:

import numpy as np
import pymc3
from scipy.special import expit
from sklearn.preprocessing import LabelBinarizer
import matplotlib.pyplot as plt


items = 400
k_mean = -7
k_sigma = 0.5
noise_sigma = 0.15
trials_lambda = 0.02
samples_per_item = 3

# drawing values at an item level
item_conversion_rates = np.random.normal(loc=k_mean, scale=k_sigma, size=items)

# values at a sample level
labels = np.repeat(np.arange(items), samples_per_item)
sample_conversion_rates = np.repeat(item_conversion_rates, samples_per_item)

trials = np.random.geometric(p=trials_lambda, size=samples_per_item * items)

adjusted_conversion_rate = expit(sample_conversion_rates)
conversions = np.random.binomial(trials, adjusted_conversion_rate)

indexed_model = pymc3.Model()
with indexed_model:
    # Priors for unknown model parameters
    k_mu = pymc3.Normal('c_mu', mu=0, sd=10)
    k_sigma = pymc3.HalfNormal('c_sigma', sd=10)

    # Matt trick
    k_step = pymc3.Normal('k_step', mu=0,
                          sd=1, shape=items)
    k = k_mu + k_step * k_sigma

    # Modeling conversions
    k_indexed = k[labels]
    conv_rate = pymc3.math.invlogit(k_indexed)

    # Likelihood
    product_conversions = pymc3.Binomial('product_conversion',
                                         n=trials,
                                         p=conv_rate,
                                         observed=conversions)

with indexed_model:
    indexed_trace = pymc3.sample(
        6000,
        tune=6000,
        njobs=4,
        chains=4,
        control=dict(target_accept=0.95,
                     max_treedepth=20),
        compute_convergence_checks=True,
    )

print(pymc3.summary(indexed_trace))
pymc3.traceplot(indexed_trace)
plt.show()
plt.scatter(indexed_trace['c_mu'], indexed_trace['c_sigma'], marker='.', alpha=0.1)
plt.show()

I am getting many errors of the type:

UserWarning: Chain 2 contains 85 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.

I also see that the sampler seems to get stuck into some regions of the parameter space - the scatter plot produced by the script above displays a ‘half-banana’ shape. I tried plotting the divergences and they do not seem to be clustered at the tip of the half banana - see below.

Any idea on how to improve that?

twiecki · January 16, 2018, 8:33pm

You’re already using the Matt trick as well as the other usual suggestions. Before digging deeper myself I would advise you to update to master where the NUTS logging has been much improved (https://github.com/pymc-devs/pymc3/commit/39cd75d65aeca6b038f154bd8c151e1b38840766). Most relevant to your case, it allows you to not only see the divergent samples but exactly where in the leapfrog trajectory the divergence occured. Perhaps that gives more insight into where the sampler is having trouble.

It is a bit puzzling though as the posterior really doesn’t look that bad (or maybe it becomes very thin in the tip of the banana).

junpenglao · January 16, 2018, 10:48pm

If you do plt.hist(adjusted_conversion_rate, 100); you can see that the overall rate is really low, which means that when you specify your prior c_mu with a zero mean Normal, there are not much weight in the tail: if doing invlogit on your prior (ie, a zero mean gaussian), you can see that the density peak around .5.

In this case, you should adjust your prior to have more weight at the lower end.

junpenglao · January 16, 2018, 11:02pm

Also, you should visualize the scatter plot using plt.scatter(indexed_trace['c_mu'], indexed_trace['c_sigma_log__'], marker='.', alpha=0.1), which is the unbounded space where the samples are taking. You can see the funnel as mu goes even smaller (since when mu goes smaller, after transformation the p is really close to 0, and the variance is bounded as a function of p: 0 < sd < p*(1-p))

aseyboldt · January 17, 2018, 11:21am

The keyword for changing the adaptation settings in nuts isn’t control as in stan, but nuts_kwargs. Apparently we introduced a bug recently, that makes it so that unknown arguments to pm.sample are ignored. We should raise an exception here…

Although the posterior isn’t as nice as one might hope, it still samples without problems for me if I pass in

nuts_kwargs=dict(target_accept=0.95,
                 max_treedepth=20),

Topic		Replies	Views
Hierarchical Binomial Model - Not converging Questions	2	888	February 1, 2019
Hierarchical Logistic Regression with Binomial Likelihood - The chain reached the maximum tree depth, The rhat statistic is larger than 1.05 for some parameters, The estimated number of effective samples is smaller than 200 for some parameters Questions difficult_inference	2	741	December 16, 2021
Many Divergences in NUTS Sampler with Beta Binomial Regression v3 development , modeling	2	474	October 25, 2022
Improving model convergence and sampling speed Questions	8	6279	September 27, 2019
Sampler gets stuck in hierarchical logistic regression modeling	4	76	April 30, 2025

Hierarchical Binomial regression - ineffective sampling

Related topics