I have what I think is a fairly simple hierarchical model that tries to model bookings for events at 2 venues (think concerts, or conference bookings). However, even given sample data generated from the prior distributions, I have a hard time getting it to properly converge and I’m wondering if I’m doing some obviously wrong here.
I’ve created some representative sample data in this file: poc_data.csv (153.7 KB).
This training dataset has 1,664 values, one row for each week of the year (52), the number of weeks leading up to each event week (1-16) and the number of venues (2).
The idea is that up to 16 weeks (called n-week) leading up to the event, we observe the reservations for the given event week in relation to the historically observed final reservation numbers for the week of the year. This gives us a historic reservation rate, that we can then later use to extrapolate current reservations to final reservations.
In addition, we model a booking rate
, which is just 1 minus the cancel rate, since not all reservations result in a final booking. We have observed 52 different booking rates, since they differ a bit seasonally from event week to week.
Lastly, we split the bookings by venue at a 75/25% split.
The final number of bookings at the venue level is then just:
venue_bookings = reservations_observed/reservation_rate * booking_rate * venue_split
I’ve modeled the likelihood of the bookings as a Poisson distribution (and the attached data file is generated from a Poisson), and then applicable rates and splits as Beta distributions, since they’re all %s between 0 and 1.
Hope that makes sense!
Below is the model, of which I’ve tried a bunch of different variations with re: to hyper-parameters etc. But I’m still getting pretty poor convergence.
I’d appreciate any suggestions re: the model, the parameterization etc. Thanks in advance!
import pandas as pd
import pymc3 as pm
df = pd.read_csv("poc_data.csv")
n_res_rate = 16
n_event_weeks = 52
n_venues = 2
n = n_res_rate * n_event_weeks * n_venues
res_rate_obs = df["res_rate_obs"].unique()
res_week_idx = df["n_week_idx"].values
book_rate_obs = df["book_rate_obs"].unique()
event_week_idx = df["event_week_idx"].values
venue_split_obs = df["venue_split_obs"].unique()
venue_idx = df["venue_idx"].values
reservations_obs = df["reservations_obs"].values
total_events_venue_obs = df["total_events_venue_obs"].values
with pm.Model() as poc_model:
# ---------------------------
# Reservation Rate
# ---------------------------
# We model up to 16 weeks ("n-week") from the actual event,
# where each n-week has a different rate, i.e.
# at n-weeks away from the event, the currently observed reservation volume
# represents x % of the final total for the event week
sd_a_r = pm.HalfNormal("sd_a_r", sd=100, shape=n_res_rate)
sd_b_r = pm.HalfNormal("sd_b_r", sd=100, shape=n_res_rate)
a_r = pm.HalfNormal("a_r", sd=sd_a_r, shape=n_res_rate)
b_r = pm.HalfNormal("b_r", sd=sd_b_r, shape=n_res_rate)
res_rate = pm.Beta("res_rate", alpha=a_r, beta=b_r,
shape=n_res_rate,
testval=res_rate_obs)
# ---------------------------
# Booking Rate
# ---------------------------
# Not all reservations result in a final booking,
# so we model a different booking rate (= 1 - Cancel Rate)
# for each event week; however, we partially pool them
# since they shouldn't differ too much across weeks
sd_a_br = pm.HalfNormal("sd_a_br", sd=10)
sd_b_br = pm.HalfNormal("sd_b_br", sd=10)
a_br = pm.HalfNormal("a_br", sd=sd_a_br, shape=n_event_weeks)
b_br = pm.HalfNormal("b_br", sd=sd_b_br, shape=n_event_weeks)
booking_rate = pm.Beta("booking_rate", alpha=a_br, beta=b_br,
shape=n_event_weeks,
testval=book_rate_obs)
# ---------------------------
# Venue Split
# ---------------------------
# We allocate bookings across 2 venues along a 75/25% split
sd_a_v = pm.HalfNormal("sd_a_v", sd=1, shape=n_venues)
sd_b_v = pm.HalfNormal("sd_b_v", sd=1, shape=n_venues)
a_v = pm.HalfNormal("a_v", sd=sd_a_v, shape=n_venues)
b_v = pm.HalfNormal("b_v", sd=sd_b_v, shape=n_venues)
venue_split = pm.Beta("venue_split", alpha=a_v, beta=b_v,
shape=n_venues, test_val=venue_split_obs)
# ---------------------------
# Bookings
# ---------------------------
# Final bookings are a factor of the implied total reservations volume, given by:
# currently observed total reservations for the week, divided by the reservations rate for the n-week
total_events = reservations_obs / res_rate[res_week_idx]
# we then allocate total events to each venue and multiply by the booking rate for the event week
venue_bookings_rate = total_events * venue_split[venue_idx] * booking_rate[event_week_idx]
venue_bookings = pm.Poisson('venue_bookings', mu=venue_bookings_rate,
observed=total_events_venue_obs)