Better Negative Binomial Model specification?

I’m using the following model to try and model the count of events - I don’t run into any divergences, but the sampling could definitely be better and the r_hat values are relatively high (1.03-1.07).

Is there a better way to specify negative binomial models than this?

Also any recommendations on a less underfit “trend” variable? It’s currently just an array of np.arange(1, len(data)+1)

 with pm.Model() as m1:
α_month = pm.Normal('α_month', 0., 5., shape=12)
     α_hour = pm.Normal('α_hour', 0., 5., shape=24)
     α_day = pm.Normal('α_day', 0., 5., shape=7)
     β_trend = pm.Normal('β_trend', 0., 5.)

log_trend = tt.log(data.trend + 1)
log_trend_std = log_trend / log_trend.max()

θ = α_month[np.array(month_ids)]\
        + α_hour[np.array(hour_ids)]\
        + α_day[np.array(day_ids)]\
        + β_trend * log_trend_std

μ = tt.exp(θ)

α = pm.Lognormal('α', 0., 5.)

counts = pm.NegativeBinomial('counts', μ, α, observed=data.counts)
trace = pm.sample(cores=4)

This seems like a complicated way of modeling a time series (I assume that’s what this is, given the month/day/hour inputs). Why not use a flexible latent time series for theta, such as a Gaussian random walk or a Gaussian process? If you want to account for periodic behavior, a GP is particularly useful.


Not sure if this is particularly useful, but some priors over the dispersion parameter can lead to issues.

e.g., see here: Stan Prior Choice - Section Story: When the generic prior fails. The case of the Negative Binomial., or alternatively here:


@MrinankSharma thank you, I was having trouble trying to find resources prior choices for NBDs!

@fonnesbeck You’re definitely correct on that - one of the reasons I’m not is that I don’t have experience with GPs (word on the street is that I can get an intro from Osvaldo Martin’s book so I’ll check that out) - if you have any simple examples of how to fit one to a time series such as this with day of week, monthly, hourly, seasonality it’d be much appreciated it!

Question for you though, my goal is to extrapolate into the future and flag event counts that are too low, and ideally I’d like to only train a model once at the start of each week. What would that extrapolating prediction process look like with a fit GP?

Hi @KyleJCaron,
I think you can take a look at the Mauna Loa examples on the PyMC website – they should give you a good idea of GPs’ capabilities and answer most of your questions.
Hope this helps :vulcan_salute:

1 Like

@AlexAndorra This is fantastic, thank you! Just what I was looking for!

1 Like

I managed to get a nice looking model sampled on a small subset of my data (10 days worth of hourly count data, so 240 observations). Now I’m trying to scale it up and I’m finding it to be incredibly slow at sampling (estimated 2 days). I’m fairly certain there are medium term irregularities as well, but the model is too slow on the data I have now (and it’s only a small subset so they’re less likely to arise in the subset). Any advice on how to better specify this?

with pm.Model() as nbd_model:
    # hourly periodic component 
    ℓ_psmooth = pm.Gamma("ℓ_psmooth", alpha=35, beta=48)
    cov_seasonal =, 0.002736, ℓ_psmooth)  # 0.002736 = 1/365.25
    gp_seasonal =
    # small/medium term irregularities
#     η_med = pm.HalfCauchy("η_med", beta=0.5, testval=0.1)
#     ℓ_med = pm.Gamma("ℓ_med", alpha=2, beta=60)
#     α = pm.Gamma("α", alpha=5, beta=2)
#     cov_medium = η_med ** 2 *, ℓ_med, α)
#     gp_medium =

    # long term trend
    η_trend = pm.HalfCauchy("η_trend", beta=2, testval=2.0)
    ℓ_trend = pm.Gamma("ℓ_trend", alpha=20, beta=3) # Expecting a count increase of 6-10 per year
    cov_trend = η_trend ** 2 *, ℓ_trend)
    gp_trend =

    # The Gaussian process is a sum of these three components
    gp = gp_seasonal + gp_trend #+ gp_medium

    mu_pred = gp.prior("mu_pred", X=t)
    mu = pm.Deterministic("mu", pm.math.exp(mu_pred))
    # Map to data with NBD and sample
    alpha = pm.Exponential('alpha', 1)
    obs = pm.NegativeBinomial('obs', mu=mu, alpha = alpha, observed = y )

    nbd_trace = pm.sample(1000, tune=1000)

Here’s an image of the fit on a small subset of data (10 days), x=time in hours, y = count of events. The blue line is the average hourly seasonality, the red line is the actual data, the black bars denote the end/start of days, and the rest is the model. I’m fairly certain that a negative binomial model is a good choice as there are a lot of days with spike such as the one around x=140 (and the variance is much higher than the mean when grouping data by the hour). There’s no trend in this subset but there is in the entire dataset