Bambi MUCH faster then my pymc implementation of Negative Binomial regression

I am doing a simple negative binomial regression (I am following along with the text “Bayes Rules!” except using PYMC/bambi for the exercises).
The data is 37 rows of data that consists of counts of Eagle sitings, year of the data, and hours of observation during that year. Bambi quickly devours this (58 seconds on my Macbook) with this model

alpha_prior = bmb.Prior('Exponential', lam= 1/50.)
model2 = bmb.Model('count ~ year + hours', eagles, family='negativebinomial',priors = {'alpha': alpha_prior})

For ‘educational’ purposes I wanted to see if I could understand this by building my own model. After many iterations I ended up with this beast:

COORDS = {"regressors": ["year", "hours"], "obs_idx" : eagles.index}

with pm.Model(coords= COORDS) as negBin:

    # define priors, weakly informative Normal
    b0 = pm.Normal("Intercept", mu=0, sigma=500)
    b = pm.Normal("slopes", mu=0, sigma=.4, dims="regressors")
    alpha_p = pm.Exponential("alpha",lam = 1/50.  )

    Y = pm.ConstantData("year", eagles["year"].to_numpy(), dims = "obs_idx")
    H = pm.ConstantData("hours", eagles["hours"].to_numpy(), dims = "obs_idx")
    C = pm.ConstantData("count", eagles["count"].to_numpy(), dims = "obs_idx")

    # define linear model and exp link function
    theta = pm.math.exp(b0 + b[0] * Y + b[1] * H) 
     
    ## Define Poisson likelihood"
    y = pm.NegativeBinomial("y", mu=theta, alpha =alpha_p, observed=C, dims="obs_idx")

This takes about 20 minutes ! ! And sometimes diverges . I am sure I am doing something bone-headed but after hours of banging my head against this I am reaching out to hope for help.

1 Like

Since the link function for the Negative Binomial likelihood is exponential, you have to be a bit careful with your priors. It’s very easy to blow up the sampler by ending up values that are too large, leading to underflows and bad times. Basically, the quantity b0 + b[0] * Y + b[1] * H needs to be less than about 700. This also means also need to think about the scale of your data, Y and H.

So the two dangers that jumped out to me in this regard were 1) the constant intercept has a standard deviation of 500, and 2) the value of “years” is in thousands, so something like 0.4 * 1000 = 400 is already quite large. Two easy fixes, then, are to reduce the standard deviation of the constant, and to re-scale “years” so it counts from 0 to 1 rather than from 1981 to 2020 (since it’s just a deterministic trend anyway). Here is what I changed:

    b0 = pm.Normal("Intercept", mu=0, sigma=100)
    b = pm.Normal("slopes", mu=0, sigma=0.4, dims="regressors")
    alpha_p = pm.Exponential("alpha", lam=1/50)

    Y = pm.ConstantData("year", (eagles.year - eagles.year.min()) / (eagles.year.max() - eagles.year.min()), dims = "obs_idx")

After these changes the model samples in ~10 seconds without divergences.

4 Likes

OH man thank you so much! ! This prompt also reminded me that about 6 months ago I read how important it was to ‘center’ your data to improve sampling in general, but of course that important idea fell out of my head!

@DrEntropy I’m very happy to see there’s someone going through Bayes Rules! with Bambi. Would you be interested in contributing an example to GitHub - bambinos/educational-resources: Educational resources?

On top of that, I guess the Bambi model works better out of the box because there’s actually some centering going on in the predictors (we just do it under the hood ;))

2 Likes