Problem with convergence in hierarchical partially pooled model

I have what I think is a fairly simple hierarchical model that tries to model bookings for events at 2 venues (think concerts, or conference bookings). However, even given sample data generated from the prior distributions, I have a hard time getting it to properly converge and I’m wondering if I’m doing some obviously wrong here.
I’ve created some representative sample data in this file: poc_data.csv (153.7 KB).

This training dataset has 1,664 values, one row for each week of the year (52), the number of weeks leading up to each event week (1-16) and the number of venues (2).
The idea is that up to 16 weeks (called n-week) leading up to the event, we observe the reservations for the given event week in relation to the historically observed final reservation numbers for the week of the year. This gives us a historic reservation rate, that we can then later use to extrapolate current reservations to final reservations.
In addition, we model a booking rate, which is just 1 minus the cancel rate, since not all reservations result in a final booking. We have observed 52 different booking rates, since they differ a bit seasonally from event week to week.
Lastly, we split the bookings by venue at a 75/25% split.
The final number of bookings at the venue level is then just:
venue_bookings = reservations_observed/reservation_rate * booking_rate * venue_split

I’ve modeled the likelihood of the bookings as a Poisson distribution (and the attached data file is generated from a Poisson), and then applicable rates and splits as Beta distributions, since they’re all %s between 0 and 1.
Hope that makes sense!

Below is the model, of which I’ve tried a bunch of different variations with re: to hyper-parameters etc. But I’m still getting pretty poor convergence.
I’d appreciate any suggestions re: the model, the parameterization etc. Thanks in advance!

import pandas as pd
import pymc3 as pm

df = pd.read_csv("poc_data.csv")

n_res_rate = 16
n_event_weeks = 52
n_venues = 2
n = n_res_rate * n_event_weeks * n_venues 

res_rate_obs = df["res_rate_obs"].unique()
res_week_idx = df["n_week_idx"].values

book_rate_obs = df["book_rate_obs"].unique()
event_week_idx = df["event_week_idx"].values

venue_split_obs = df["venue_split_obs"].unique()
venue_idx = df["venue_idx"].values

reservations_obs = df["reservations_obs"].values
total_events_venue_obs = df["total_events_venue_obs"].values
with pm.Model() as poc_model:
    
    # ---------------------------
    # Reservation Rate
    # ---------------------------
    # We model up to 16 weeks ("n-week") from the actual event, 
    # where each n-week has a different rate, i.e.
    # at n-weeks away from the event, the currently observed reservation volume 
    # represents x % of the final total for the event week
    sd_a_r = pm.HalfNormal("sd_a_r", sd=100, shape=n_res_rate)
    sd_b_r = pm.HalfNormal("sd_b_r", sd=100, shape=n_res_rate)

    a_r = pm.HalfNormal("a_r", sd=sd_a_r, shape=n_res_rate)
    b_r = pm.HalfNormal("b_r", sd=sd_b_r, shape=n_res_rate)
    
    res_rate = pm.Beta("res_rate", alpha=a_r, beta=b_r, 
                        shape=n_res_rate, 
                        testval=res_rate_obs)
    
    # ---------------------------
    # Booking Rate
    # ---------------------------
    # Not all reservations result in a final booking, 
    # so we model a different booking rate (= 1 - Cancel Rate)
    # for each event week; however, we partially pool them 
    # since they shouldn't differ too much across weeks 
    sd_a_br = pm.HalfNormal("sd_a_br", sd=10)
    sd_b_br = pm.HalfNormal("sd_b_br", sd=10)

    a_br = pm.HalfNormal("a_br", sd=sd_a_br, shape=n_event_weeks)
    b_br = pm.HalfNormal("b_br", sd=sd_b_br, shape=n_event_weeks)

    booking_rate = pm.Beta("booking_rate", alpha=a_br, beta=b_br, 
                                            shape=n_event_weeks, 
                                             testval=book_rate_obs)
    
    # ---------------------------
    # Venue Split
    # ---------------------------
    # We allocate bookings across 2 venues along a 75/25% split
    sd_a_v = pm.HalfNormal("sd_a_v", sd=1, shape=n_venues)
    sd_b_v = pm.HalfNormal("sd_b_v", sd=1, shape=n_venues)

    a_v = pm.HalfNormal("a_v", sd=sd_a_v, shape=n_venues)
    b_v = pm.HalfNormal("b_v", sd=sd_b_v, shape=n_venues)
    venue_split = pm.Beta("venue_split", alpha=a_v, beta=b_v, 
                               shape=n_venues, test_val=venue_split_obs)

    # ---------------------------
    # Bookings
    # ---------------------------
    # Final bookings are a factor of the implied total reservations volume, given by:
    # currently observed total reservations for the week, divided by the reservations rate for the n-week
    total_events = reservations_obs / res_rate[res_week_idx]
    # we then allocate total events to each venue and multiply by the booking rate for the event week
    venue_bookings_rate = total_events * venue_split[venue_idx] * booking_rate[event_week_idx]
    
    venue_bookings = pm.Poisson('venue_bookings', mu=venue_bookings_rate,
                                 observed=total_events_venue_obs)

For example, the parameter distribution recovered for the venue_split variable, which are 2 simple values, looks like this, so clearly the sampler has issues sampling this model, but I can’t tell where it’s misspecified:

Also, I’ve run a test model where I specify hard values for the alpha and beta parameters for each of the beta prior, which converges very well, so I’m thinking I’ve maybe misspecified the hyper-priors in the model above?

This is the test model, for comparison:

with pm.Model() as baseline_model:
    
    # ---------------------------
    # Reservation Rate
    # ---------------------------
    res_rate = pm.Beta("res_rate", alpha=res_rate_obs * 1000, 
                                   beta=(1-res_rate_obs)*1000, 
                                   shape=n_res_rate, 
                                   testval=res_rate_obs)
    
    # ---------------------------
    # Booking Rate
    # ---------------------------
    booking_rate = pm.Beta("booking_rate", 
                           alpha=book_rate_obs * 1000, 
                           beta=(1-book_rate_obs)*1000, 
                           shape=n_event_weeks, 
                           testval=book_rate_obs)
    
    # ---------------------------
    # Venue Split
    # ---------------------------
    venue_split = pm.Beta("venue_split", 
                          alpha=venue_split_obs * 1000, 
                          beta=(1-venue_split_obs)*1000, 
                          shape=n_venues)

    # ---------------------------
    # Bookings
    # ---------------------------
    total_events = reservations_obs / res_rate[res_week_idx]
    venue_bookings_rate = total_events * venue_split[venue_idx] * booking_rate[event_week_idx]
    
    venue_bookings = pm.Poisson('venue_bookings', mu=venue_bookings_rate, observed=total_events_venue_obs)

This kind of convergence is what I’d like to see from the real model:

You model would be over specify if you do:

    sd_a_r = pm.HalfNormal("sd_a_r", sd=100, shape=n_res_rate)
    sd_b_r = pm.HalfNormal("sd_b_r", sd=100, shape=n_res_rate)

    a_r = pm.HalfNormal("a_r", sd=sd_a_r, shape=n_res_rate)
    b_r = pm.HalfNormal("b_r", sd=sd_b_r, shape=n_res_rate)

plus sd=100 is likely too much. Try instead:

    sd_a_r = pm.HalfNormal("sd_a_r", sd=5)
    sd_b_r = pm.HalfNormal("sd_b_r", sd=5)

    a_r = pm.HalfNormal("a_r", sd=sd_a_r, shape=n_res_rate)
    b_r = pm.HalfNormal("b_r", sd=sd_b_r, shape=n_res_rate)

@junpenglao thanks for that suggestion! I’ve tried this and several other combinations of specifying hyper parameters for the 3 variables and cannot get it to recover the parameters correctly. I’ve also tried not setting prior distributions for the sd parameters and just hard coding them in the a and b parameters with marginally better luck.
E.g.
1)


2)

I’m really stumped here! Shouldn’t this be a fairly easy model to sample?

Well,
https://twitter.com/betanalpha/status/1046952141821288448

My suggestion is to avoid using a Beta-binomial model - a logistic mixed model would be much easier to implement (and we understand it better as well) .

Ah thanks, although that’s a downer! Would you happen to know of any good examples of how to respecify something like this? Sorry if this is a little noob-y, but do you mean rewriting this multiplicative model into an additive model by taking logs? And then doing a hierarchical GLM (like the ones in the PyMC3 examples?)
I’ve looked at those of course, but I haven’t been able to figure out how to frame my data generating process into something like that.

Thanks in advance as always!