Hierarchical Binomial regression - ineffective sampling

Hi all,

I am trying to write a model on a hierarchical binomial regression model and the convergence is quite slow even when working on simulated data.
Here is a snippet with a reproducible example:

import numpy as np
import pymc3
from scipy.special import expit
from sklearn.preprocessing import LabelBinarizer
import matplotlib.pyplot as plt


items = 400
k_mean = -7
k_sigma = 0.5
noise_sigma = 0.15
trials_lambda = 0.02
samples_per_item = 3

# drawing values at an item level
item_conversion_rates = np.random.normal(loc=k_mean, scale=k_sigma, size=items)

# values at a sample level
labels = np.repeat(np.arange(items), samples_per_item)
sample_conversion_rates = np.repeat(item_conversion_rates, samples_per_item)

trials = np.random.geometric(p=trials_lambda, size=samples_per_item * items)

adjusted_conversion_rate = expit(sample_conversion_rates)
conversions = np.random.binomial(trials, adjusted_conversion_rate)

indexed_model = pymc3.Model()
with indexed_model:
    # Priors for unknown model parameters
    k_mu = pymc3.Normal('c_mu', mu=0, sd=10)
    k_sigma = pymc3.HalfNormal('c_sigma', sd=10)

    # Matt trick
    k_step = pymc3.Normal('k_step', mu=0,
                          sd=1, shape=items)
    k = k_mu + k_step * k_sigma

    # Modeling conversions
    k_indexed = k[labels]
    conv_rate = pymc3.math.invlogit(k_indexed)

    # Likelihood
    product_conversions = pymc3.Binomial('product_conversion',
                                         n=trials,
                                         p=conv_rate,
                                         observed=conversions)

with indexed_model:
    indexed_trace = pymc3.sample(
        6000,
        tune=6000,
        njobs=4,
        chains=4,
        control=dict(target_accept=0.95,
                     max_treedepth=20),
        compute_convergence_checks=True,
    )

print(pymc3.summary(indexed_trace))
pymc3.traceplot(indexed_trace)
plt.show()
plt.scatter(indexed_trace['c_mu'], indexed_trace['c_sigma'], marker='.', alpha=0.1)
plt.show()

I am getting many errors of the type:

UserWarning: Chain 2 contains 85 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.

I also see that the sampler seems to get stuck into some regions of the parameter space - the scatter plot produced by the script above displays a ‘half-banana’ shape. I tried plotting the divergences and they do not seem to be clustered at the tip of the half banana - see below.
image
Any idea on how to improve that?

You’re already using the Matt trick as well as the other usual suggestions. Before digging deeper myself I would advise you to update to master where the NUTS logging has been much improved (https://github.com/pymc-devs/pymc3/commit/39cd75d65aeca6b038f154bd8c151e1b38840766). Most relevant to your case, it allows you to not only see the divergent samples but exactly where in the leapfrog trajectory the divergence occured. Perhaps that gives more insight into where the sampler is having trouble.

It is a bit puzzling though as the posterior really doesn’t look that bad (or maybe it becomes very thin in the tip of the banana).

If you do plt.hist(adjusted_conversion_rate, 100); you can see that the overall rate is really low, which means that when you specify your prior c_mu with a zero mean Normal, there are not much weight in the tail: if doing invlogit on your prior (ie, a zero mean gaussian), you can see that the density peak around .5.

In this case, you should adjust your prior to have more weight at the lower end.

Also, you should visualize the scatter plot using plt.scatter(indexed_trace['c_mu'], indexed_trace['c_sigma_log__'], marker='.', alpha=0.1), which is the unbounded space where the samples are taking. You can see the funnel as mu goes even smaller (since when mu goes smaller, after transformation the p is really close to 0, and the variance is bounded as a function of p: 0 < sd < p*(1-p))
image

2 Likes

The keyword for changing the adaptation settings in nuts isn’t control as in stan, but nuts_kwargs. Apparently we introduced a bug recently, that makes it so that unknown arguments to pm.sample are ignored. We should raise an exception here…

Although the posterior isn’t as nice as one might hope, it still samples without problems for me if I pass in

nuts_kwargs=dict(target_accept=0.95,
                 max_treedepth=20),