Negative binomial model with exposure

Hi, I’ve been working on a negative binomial regression in PyMC. Attached is a simplified version of my model (in reality I have more parameters in the linear equation, and a multilevel component).

with pm.Model() as model:
    x_1 = pm.MutableData(
        "x_1", X_train['x_1']
    )
    exposure = pm.MutableData("exposure", X_train["exposure"])

    y_obs = pm.MutableData("y_obs", y_train["y"])


    intercept_mu = pm.Normal(
        "intercept_mu", 0, 10
    )

    beta_mu_x1 = pm.Normal("beta_mu_x1", 0, 5)

    mu = pm.Deterministic(
        "mu",
        pm.math.exp(
            intercept_mu
            + x_1 * beta_mu_x1
        )
        * exposure,
    )
    
    intercept_alpha = pm.Normal(
        "intercept_alpha", 0, 10
    )

    beta_alpha_x1 = pm.Normal("beta_alpha_x1", 0, 5)

    alpha = pm.Deterministic(
        "alpha",
        pm.math.exp(
            intercept_alpha
            + x_1 * beta_alpha_x1
        )
    )
    y = pm.NegativeBinomial("y", mu=mu, alpha=alpha, observed=y_obs)

As you can see, I have quite a lot of data and sampling is quite slow. With the full parameterisation it takes c. 1h30 to fit, the results look reasonable though.

I would appreciate if someone could just check that the model is sound for my peace of mind!

I’m definitely not an expert in pymc, but having worked with GPs with Negative Binomial likelihoods recently in pymc, it seems correct to me :slight_smile: I would wait though for the confirmation of someone else with more experience.

Regarding the sampling: Which NUTS sampler are you using? I tried nuts_sampler="numpyro", which was way faster in my case. I also tried nutpie but it turned out to be much slower.

2 Likes

Thanks for checking!

I was using the default sampler and changing to numpyro doubled the speed so thanks for that too.