Hello! I am struggling to specify my hierarchical model in PyMC5. I am sure that part of the problem is that I am relatively new to bayesian analysis, but I am also struggling with the code syntax. Using the data below, the goal is to predict soda_sales
. Here is an example of how the data looks:
theater_id | region | quarter | candy_sales | soda_sales |
---|---|---|---|---|
1 | North | 1 | 1400 | 58391 |
1 | North | 2 | 1234 | 59148 |
1 | North | 3 | 5423 | 68582 |
1 | North | 4 | 1345 | 59017 |
2 | East | 1 | 6541 | 49275 |
2 | East | 2 | 9541 | 10043 |
2 | East | 3 | 1348 | 49294 |
2 | East | 4 | 5913 | 23454 |
3 | West | 1 | 5911 | 32048 |
3 | West | 2 | 1396 | 48940 |
3 | West | 3 | 6032 | 23442 |
3 | West | 4 | 6926 | 34990 |
Here is my code for a basic hierarchical model:
theater_idx, theaters = df.theater_id.factorize()
coords = {'theater': theaters}
with pm.Model(coords=coords) as partial_pooled_model:
mu_pooled_params = pm.find_constrained_prior(
pm.Normal,
lower=0,
upper=400,
init_guess={'mu':200, 'sigma': 50},
)
mu_pooled = pm.Normal('mu_pooled', **mu_pooled_params)
sigma_pooled_params = pm.find_constrained_prior(
pm.HalfNormal,
lower=1,
upper=75,
init_guess={'sigma':50}
)
sigma_pooled = pm.HalfNormal('sigma_pooled', **sigma_pooled_params)
with partial_pooled_model:
mu = pm.Normal('mu', mu=mu_pooled, sigma=sigma_pooled, dims='theater')
sigma = pm.HalfNormal('sigma', sigma=50)
y_hat = pm.Normal('y_hat', mu=mu[theater_idx], sigma=sigma, observed=y)
pm.model_to_graphviz(partial_pooled_model)
This model does an excellent job of finding separate mu’s for each theater. But now I want to extend the model to include the other predictors in my data and I am totally lost. How do I update y_hat to include the other partially-pooled categorical variables as well as the continuous variables? I’ve read through the blog posts and other questions here but I just cant crack it. Would love any insight you can offer.
Thanks!