Simple Negative Binomial Model to recover N from a Binomial Model?

Hi,
I’m trying to set up what I think should be a simple negative binomial model.
If weekly_reservations ~ B( total_reservations , reservation_rate ), I’m trying to recover total_reservations given a level of weekly_reservations for a given week.
I’ve set up this model so far, but the values for the posterior reservation_rate don’t match the expected values, and it seems like I’m missing a transform somewhere.
I have been unlucky trying to find any specific examples for something like this. Any ideas much appreciated!

Data:

weekly_reservations = np.array([    14,     41,     41,     44,     74,    104,    162,    205,
           274,    329,    372,    473,    534,    666,    788,    927,
          1147,   1399,   1673,   2007,   2476,   3003,   3674,   4344,
          5237,   6306,   8026,   9895,  12125,  14792,  18071,  21929,
         26690,  32855,  39759,  48684,  59371,  71765,  86179, 102734,
        121958, 142976, 166365, 191495, 218210, 245875, 272319, 298141,
        323690, 349570, 375785, 405290, 430483, 433995])

total_reservations = np.array([434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001, 434001, 434001,
        434001, 434001, 434001, 434001, 434001, 434001])

My model so far:

n_weeks = len(total_reservations)

with pm.Model() as simple_model:
    
    alpha = pm.HalfNormal("alpha", sd=1)
    
    sd_a = pm.HalfNormal("sd_a", sd=1)
    sd_b = pm.HalfNormal("sd_b", sd=1)

    a = pm.HalfNormal("a", sd=sd_a)
    b = pm.HalfNormal("b", sd=sd_b)

    reservation_rate = pm.Beta("reservation_rate", alpha=a, beta=b, shape=n_weeks)

    est_total = pm.Deterministic('est_total', weekly_reservations / reservation_rate)

    N = pm.NegativeBinomial('N', mu=est_total, alpha=alpha, observed=total_reservations)

However, rate reservation_rate doesn’t quite match what I’m expecting:

(Note: updated to better reflect the business use case)

Hi @clausherther!

I’m confused about your problem: you say that you have data that is negative-binomial distributed. Is that x or y in your code? And what is the other? Also, what’s plotted on the x axis in your graph? Sorry for the questions, just trying to understand what’s going on here. :slightly_smiling_face:

Sorry, yeah I might not have been that clear after futzing around with this last night too long!

I have data that is binomially distributed, where x ~ B(p, y). For context, x represents weekly cumulative reservations, p is the percentage of the total these cumulative reservations represent to-date, and y is the final reservation number.
What I’d like to model is the inverse of that where I model y, with p as a variable and x as the input. So, in other words, since I don’t know the final reservation number, I’d like to model that as a target variable, given each week’s cumulative reservations to-date and p for the week.
My understanding is that that would be a Negative Binomial, but I may be incorrect there. Any ideas?

I updated the original post to hopefully be more clear!

I see! The first thing that comes to mind would be to model your weekly reservations as Poisson distributed. It’s true that the negative binomial model is a generalization of the Poisson model, which might be why you’re dabbling with that. I’d suggest fitting a Poisson first: if that doesn’t work out well, then try negative binomial.

What’s more concerning to me is that the rate of increase of reservations increases rapidly with time, which makes your problem really non-stationary (i.e. there’s no one Poisson or lambda value that would describe all your data, but this would be a problem no matter which model you chose). Do you have reason to believe that this increases with time (i.e. restaurant gets more popular or something)?

1 Like

Thanks for that suggestion, will give Poisson a try.

The reservations to-date are similar to hotel bookings for future weekends. So each week represents the reservations we have to-date for some future week (the actual model will have 52 of these for each of the next 52 weeks, by week of when the reservation was placed).

I think one thing I’m definitely doing wrong is pooling a and b in that Beta distribution, and/or using a Beta distribution here at all. In the last few weeks, closer to the actual event, the reservations are almost all in, so the rate is close to 100%. Using a beta, the mean of that distribution will not be close enough to 1 ever.

I’ve tried this much more relaxed model which seems to get me closer…

n_weeks = len(total_reservations)

with pm.Model() as simple_model:
    
    alpha = pm.HalfNormal("alpha", sd=50)
   
    reservation_rate = pm.Uniform("reservation_rate", lower=0, upper=2, shape=n_weeks)

    est_total = pm.Deterministic('est_total', weekly_reservations / reservation_rate)

    N = pm.NegativeBinomial('N', mu=est_total, alpha=alpha, observed=total_reservations)

Just to follow-up here, a Poisson likelihood worked best, thanks @_eigenfoo! However, to get it to work, I had to specify testval from the actual data in the Beta distribution for the reservation rate.

So, this is unpooled since it’s only one model:

n_weeks = len(total_reservations)
res_rate_obs = weekly_reservations/total_reservations

with pm.Model() as simple_model:
    
    sd_a = pm.HalfNormal("sd_a", sd=10, shape=n_weeks)
    sd_b = pm.HalfNormal("sd_b", sd=10, shape=n_weeks)

    a = pm.HalfNormal("a", sd=sd_a, shape=n_weeks)
    b = pm.HalfNormal("b", sd=sd_b, shape=n_weeks)

    reservation_rate = pm.Beta("reservation_rate", alpha=a, beta=b, shape=n_weeks, testval=res_rate_obs)
    est_total = pm.Deterministic('est_total', weekly_reservations / reservation_rate)
    
    N = pm.Poisson('N', mu=est_total, observed=total_reservations)

1 Like