Hierarchical Logistic Regression sampling problems

I’m trying to perform some inference on a hierarchical logistic regression model involving three nominal predictors. (I’m following along from the ch 21 notebook from Kruschke’s Doing Bayesian Data Analysis: https://github.com/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2021.ipynb)

I’m able to sample using NUTS and the sampling seems reasonable, but it’s taking roughly 5 minutes to perform the inference on 84 data points with 18 total predictors and I was wondering if this was a typical performance speed.

I don’t know if the shape of the posterior is what is causing the problem since there doesn’t seem to be correlation between most of the variables from the scatter plots I produced.

Here is a scatter plot for one of the omega variables and the kappa variable (individual variance)

and a more typical plot between the coefficient and the scaling parameter

Attached is the python code I’m using to make the model and see below for the model itself for quick reference.

Any help is greatly appreciated.

with pm.Model() as model:
    a_0 = pm.Normal(
        'a_0', mu=0.0, tau=1 / (2 ** 2), shape=1
    )
    
    a_1_sigma = pm.HalfNormal('a_1_sigma', sd=10)
    a_1_offset = pm.Normal(
        'a_1_offset', mu=0, sd=1, 
        shape=x_1.eval().dtype.categories.size
    )
    a_1 = pm.Deterministic('a_1', 0.0 + a_1_offset * a_1_sigma)

    a_3_sigma = pm.HalfNormal('a_3_sigma', sd=10)
    a_3_offset = pm.Normal(
        'a_3_offset', mu=0, sd=1, 
        shape=x_3.eval().dtype.categories.size
    )
    a_3 = pm.Deterministic('a_3', 0.0 + a_3_offset * a_3_sigma)

    a_4_sigma = pm.HalfNormal('a_4_sigma', sd=10)
    a_4_offset = pm.Normal(
        'a_4_offset', mu=0, sd=1, 
        shape=x_4.eval().dtype.categories.size
    )
    a_4 = pm.Deterministic('a_4', 0.0 + a_4_offset * a_4_sigma)

    # Parameters for categories
    link_argument = (
        a_0 + 
        a_1[x_1.eval().codes] +
        a_3[x_3.eval().codes] + 
        a_4[x_4.eval().codes]
    )
    omega = pm.Deterministic('omega', pm.invlogit(link_argument))
    kappa = pm.Gamma('kappa', 0.1, 0.1)

    # Mean parameter for individual data (shape=84 for train data)
    mu = pm.Beta(
        'mu', alpha=omega * kappa + 1, beta=(1 - omega) * kappa + 1, 
        shape=x_1.eval().codes.size
    )

    likelihood = pm.Binomial('likelihood', p=mu, n=n, observed=y_obs)

    # Rescale coefficients to be deflections from baseline
    b_0 = pm.Deterministic('b_0', tt.mean(link_argument))
    b_1 = pm.Deterministic('b_1', link_argument[x_1.eval().codes] - b_0)
    b_3 = pm.Deterministic('b_3', link_argument[x_3.eval().codes] - b_0)
    b_4 = pm.Deterministic('b_4', link_argument[x_4.eval().codes] - b_0)

hierarchical_logistic_regression.py (4.0 KB)
anonymized_data.csv (3.8 KB)

The model seems fine to me. The scatter plot also looks good. If the trace plot looks fine (in terms of mixing) and there is no warning (especially no divergent warning) then there is no reason to concern.

But should sampling take 5 minutes for such few data points? I was hoping to expand this to include more data points and categories within predictors.

Also there was mention of some 36 diverging samples after tuning.

Linear model like this you can usually get a bit of improve in performance using MAP as starting value.

Also, the sampling speed is not necessary fast with small data set - you have 21 predictors in your linear equation, so a bit more data might actually helps the model converge as there is more information from the likelihood.

One more observation is that, your kappa prior is concentrated near zero, while the posterior of kappa is quite large. I think this is the main cause of the divergence. You should change it to something wider.

1 Like

I see, thank you for the help!

1 Like