Bambi MUCH faster then my pymc implementation of Negative Binomial regression

DrEntropy · September 20, 2022, 11:21pm

I am doing a simple negative binomial regression (I am following along with the text “Bayes Rules!” except using PYMC/bambi for the exercises).
The data is 37 rows of data that consists of counts of Eagle sitings, year of the data, and hours of observation during that year. Bambi quickly devours this (58 seconds on my Macbook) with this model

alpha_prior = bmb.Prior('Exponential', lam= 1/50.)
model2 = bmb.Model('count ~ year + hours', eagles, family='negativebinomial',priors = {'alpha': alpha_prior})

For ‘educational’ purposes I wanted to see if I could understand this by building my own model. After many iterations I ended up with this beast:

COORDS = {"regressors": ["year", "hours"], "obs_idx" : eagles.index}

with pm.Model(coords= COORDS) as negBin:

    # define priors, weakly informative Normal
    b0 = pm.Normal("Intercept", mu=0, sigma=500)
    b = pm.Normal("slopes", mu=0, sigma=.4, dims="regressors")
    alpha_p = pm.Exponential("alpha",lam = 1/50.  )

    Y = pm.ConstantData("year", eagles["year"].to_numpy(), dims = "obs_idx")
    H = pm.ConstantData("hours", eagles["hours"].to_numpy(), dims = "obs_idx")
    C = pm.ConstantData("count", eagles["count"].to_numpy(), dims = "obs_idx")

    # define linear model and exp link function
    theta = pm.math.exp(b0 + b[0] * Y + b[1] * H) 
     
    ## Define Poisson likelihood"
    y = pm.NegativeBinomial("y", mu=theta, alpha =alpha_p, observed=C, dims="obs_idx")

This takes about 20 minutes ! ! And sometimes diverges . I am sure I am doing something bone-headed but after hours of banging my head against this I am reaching out to hope for help.

jessegrabowski · September 21, 2022, 12:08am

Since the link function for the Negative Binomial likelihood is exponential, you have to be a bit careful with your priors. It’s very easy to blow up the sampler by ending up values that are too large, leading to underflows and bad times. Basically, the quantity b0 + b[0] * Y + b[1] * H needs to be less than about 700. This also means also need to think about the scale of your data, Y and H.

So the two dangers that jumped out to me in this regard were 1) the constant intercept has a standard deviation of 500, and 2) the value of “years” is in thousands, so something like 0.4 * 1000 = 400 is already quite large. Two easy fixes, then, are to reduce the standard deviation of the constant, and to re-scale “years” so it counts from 0 to 1 rather than from 1981 to 2020 (since it’s just a deterministic trend anyway). Here is what I changed:

    b0 = pm.Normal("Intercept", mu=0, sigma=100)
    b = pm.Normal("slopes", mu=0, sigma=0.4, dims="regressors")
    alpha_p = pm.Exponential("alpha", lam=1/50)

    Y = pm.ConstantData("year", (eagles.year - eagles.year.min()) / (eagles.year.max() - eagles.year.min()), dims = "obs_idx")

After these changes the model samples in ~10 seconds without divergences.

DrEntropy · September 21, 2022, 12:13am

OH man thank you so much! ! This prompt also reminded me that about 6 months ago I read how important it was to ‘center’ your data to improve sampling in general, but of course that important idea fell out of my head!

tcapretto · September 21, 2022, 2:06pm

@DrEntropy I’m very happy to see there’s someone going through Bayes Rules! with Bambi. Would you be interested in contributing an example to GitHub - bambinos/educational-resources: Educational resources?

On top of that, I guess the Bambi model works better out of the box because there’s actually some centering going on in the predictors (we just do it under the hood ;))

Topic		Replies	Views
Appropriate params for NegativeBinomial Likelihood and model evaluation Questions	1	572	October 5, 2021
Negative binomial model with exposure version agnostic gaussian_process , modeling	2	295	February 13, 2024
Sample_prior_predictive doesn't work with Negative Binomial likelihood Questions	4	453	March 31, 2020
Negative binomial as a likelihood v5	3	221	March 3, 2024
Slow performance using Bambi compared to Example, please help! version agnostic bambi	6	1348	May 24, 2022

Bambi MUCH faster then my pymc implementation of Negative Binomial regression

Related topics