Hierarchical model w/ categorical and continuous predictors

Hello! I am struggling to specify my hierarchical model in PyMC5. I am sure that part of the problem is that I am relatively new to bayesian analysis, but I am also struggling with the code syntax. Using the data below, the goal is to predict soda_sales. Here is an example of how the data looks:

theater_id region quarter candy_sales soda_sales
1 North 1 1400 58391
1 North 2 1234 59148
1 North 3 5423 68582
1 North 4 1345 59017
2 East 1 6541 49275
2 East 2 9541 10043
2 East 3 1348 49294
2 East 4 5913 23454
3 West 1 5911 32048
3 West 2 1396 48940
3 West 3 6032 23442
3 West 4 6926 34990

Here is my code for a basic hierarchical model:

theater_idx, theaters = df.theater_id.factorize()
coords = {'theater': theaters}

with pm.Model(coords=coords) as partial_pooled_model:
    mu_pooled_params = pm.find_constrained_prior(
        init_guess={'mu':200, 'sigma': 50},
    mu_pooled = pm.Normal('mu_pooled', **mu_pooled_params)

    sigma_pooled_params = pm.find_constrained_prior(
    sigma_pooled = pm.HalfNormal('sigma_pooled', **sigma_pooled_params)

with partial_pooled_model:
    mu = pm.Normal('mu', mu=mu_pooled, sigma=sigma_pooled, dims='theater')
    sigma = pm.HalfNormal('sigma', sigma=50)
    y_hat = pm.Normal('y_hat', mu=mu[theater_idx], sigma=sigma, observed=y)


This model does an excellent job of finding separate mu’s for each theater. But now I want to extend the model to include the other predictors in my data and I am totally lost. How do I update y_hat to include the other partially-pooled categorical variables as well as the continuous variables? I’ve read through the blog posts and other questions here but I just cant crack it. Would love any insight you can offer.


How do you want to extend it? Do you want to include both candy and soda sales in the model to be predicted and add a quarter wise effect? Note that region and theatre ids are completely correlated (i.e there is only one theatre in the north etc) so there is no sense of trying to add that as a separate predictor unless you have a larger data set with more theatres per region. In any case

Adding candy:

This is relatively straight forward and can be done by just adding an extra coordinate with two elements:

coords = {"theater": theaters, "items": ["candy","soda"]}

Then when defining you priors, you have to specify dim=“items” and also make sure that you are supplying correct number of parameters. For instance to create two normals centred at a two different means you can do:

pm.Normal("mu_pooled", [candy_mu, soda_mu], [candy_sd, soda_sd], dims="items")

You can either guess these inputs or get them from find_constrained_prior as you did. You should also set your final mu as

mu = pm.Normal('mu', mu=mu_pooled, sigma=sigma_pooled, dims=["theater","items"])

You should also define the sigma for your likelihood for each item since they are on different scales. Or alternatively you can get rid of likelihood sigma and just use Poisson likelihood which I think will be more natural since your data is discrete:

y_hat = pm.Poisson('y_hat', mu=mu[theater_idx,:], observed=y)

For adding the quarter as a predictor, you have to first decide how you want to add that to the model. Like an overall quarterly increase or decrease of sales of everything for instance? Making something more complicated than this will probably run into identifiability issues (such as region x quarter wise effect) if your real data is not larger with less correlated predictors than what you show here.

ps: A good place to look at for good examples of hierarchical models would be chapter 9 of John Kruschke - Doing Bayesian Data Analysis. Fortunately there are pymc adaptations of the code shown there too:


Thank you so much for the reply! You are correct that the dataset is much larger and there are many distinct theater_ids within each region - so these should not be perfectly correlated.

To clarify, I would like to predict soda_sales with the other variables as predictors. The region, quarter, and theater_id are categories, but candy_sales is continuous (as is the target, soda_sales).

My main sticking point is how I calculate mu given one hierarchical variable (theater_ids) as well as some additional categorical and continuous predictors. Right now I am treating all the categorical variables are hierarchical and just adding everything together in mu, as below. How else could I do it if I just wanted one hierarchical variable theater_id and the rest to be treated as typical predictors in a regression?

region_idx, regions = X.region.factorize()
coords['region'] = regions

type_idx, types = X.type.factorize()
coords['type'] = types

quarter_idx, quarters = X.quarter.factorize()
coords['quarter'] = quarters

with pm.Model(coords=coords) as model:
    regional_mu_param = pm.Normal('regional_mu_param', mu=200, sigma=50)
    regional_sigma_param = pm.HalfNormal('regional_sigma_param', sigma=50)
    regional_mu = pm.Normal('regional_mu', mu=regional_mu_param, sigma=regional_sigma_param, dims='region')

    type_mu_param = pm.Normal('type_mu_param', mu=200, sigma=50)
    type_sigma_param = pm.HalfNormal('type_sigma_param', sigma=50)
    type_mu = pm.Normal('type_mu', mu=type_mu_param, sigma=type_sigma_param, dims='type')

    quarter_mu_param = pm.Normal('quarter_mu_param', mu=200, sigma=50)
    quarter_sigma_param = pm.HalfNormal('quarter_sigma_param', sigma=50)
    quarter_mu = pm.Normal('quarter_mu', mu=quarter_mu_param, sigma=quarter_sigma_param, dims='quarter')

    theater_mu_param = pm.Normal('theater_mu_param', mu=200, sigma=50)
    theater_sigma_param = pm.HalfNormal('theater_sigma_param', sigma=50) 
    theater_mu = pm.Normal('theater_mu', mu=theater_mu_param, sigma=theater_sigma_param, dims='theater')

    #Candy sales
    candy_beta = pm.Normal('candy_beta', mu= 150, sigma=25)
    #Combine to calculate mu
    mu = theater_mu[theater_idx] + regional_mu[region_idx] + type_mu[type_idx]  + quarter_mu[quarter_idx] + candy_beta * X['candy_sales']
    sigma = pm.HalfNormal('sigma', sigma=50)
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=y)

If by that you mean you want to use categorical values as your observed covariates for regression and have like slope coefficients that represent the effect size of those then you are looking at regression with categorical variables:

The main idea is to take your categories and code them as some sort of dummy variables (have a look at the examples in the link above) and use them as the covariates. For instance if your categories were male and female then for each entry in your data you would code this by two new numerical covariates: either [1, 0] or [0, 1] first one representing male, the latter female. In pandas you can achieve this via for instance:

pd.get_dummies(df,columns=[ "quarter"], dtype=int)

There are however subtle points you need to pay attention to (they would definitely create problems for likelihood based frequentists statistics, I would assume might also do so for Bayesian unless you have very informed priors). For instance when you do such a conversion, there will always be perfect correlation among some of your new covariates therefore generally people drop one of the categories (which can be achieved by supplying drop_first = True). See:


Interpretation of your results also become in some sense relative because you drop one category. For instance in your case lets say you drop quarter 1, then slopes for quarters 2,3,4 will tell you how much things change in these quarters relative to 1 (and the intercept value in your regression will likely represent some sort of average effect for dropped covariates). These are explained more in detail in Kruschke Chapter 15, more specifically 15.2.4 Nominal Predictors.

1 Like