Hierarchical model w/ categorical and continuous predictors

Phil_Clark1 · March 25, 2024, 1:28am

Hello! I am struggling to specify my hierarchical model in PyMC5. I am sure that part of the problem is that I am relatively new to bayesian analysis, but I am also struggling with the code syntax. Using the data below, the goal is to predict soda_sales. Here is an example of how the data looks:

theater_id	region	quarter	candy_sales	soda_sales
1	North	1	1400	58391
1	North	2	1234	59148
1	North	3	5423	68582
1	North	4	1345	59017
2	East	1	6541	49275
2	East	2	9541	10043
2	East	3	1348	49294
2	East	4	5913	23454
3	West	1	5911	32048
3	West	2	1396	48940
3	West	3	6032	23442
3	West	4	6926	34990

Here is my code for a basic hierarchical model:

theater_idx, theaters = df.theater_id.factorize()
coords = {'theater': theaters}

with pm.Model(coords=coords) as partial_pooled_model:
    mu_pooled_params = pm.find_constrained_prior(
        pm.Normal,
        lower=0,
        upper=400,
        init_guess={'mu':200, 'sigma': 50},
    )
    mu_pooled = pm.Normal('mu_pooled', **mu_pooled_params)

    sigma_pooled_params = pm.find_constrained_prior(
        pm.HalfNormal,
        lower=1,
        upper=75,
        init_guess={'sigma':50}
    )
    sigma_pooled = pm.HalfNormal('sigma_pooled', **sigma_pooled_params)


with partial_pooled_model:
    mu = pm.Normal('mu', mu=mu_pooled, sigma=sigma_pooled, dims='theater')
    sigma = pm.HalfNormal('sigma', sigma=50)
        
    y_hat = pm.Normal('y_hat', mu=mu[theater_idx], sigma=sigma, observed=y)

pm.model_to_graphviz(partial_pooled_model)

This model does an excellent job of finding separate mu’s for each theater. But now I want to extend the model to include the other predictors in my data and I am totally lost. How do I update y_hat to include the other partially-pooled categorical variables as well as the continuous variables? I’ve read through the blog posts and other questions here but I just cant crack it. Would love any insight you can offer.

Thanks!

iavicenna · March 25, 2024, 12:33pm

How do you want to extend it? Do you want to include both candy and soda sales in the model to be predicted and add a quarter wise effect? Note that region and theatre ids are completely correlated (i.e there is only one theatre in the north etc) so there is no sense of trying to add that as a separate predictor unless you have a larger data set with more theatres per region. In any case

Adding candy:

This is relatively straight forward and can be done by just adding an extra coordinate with two elements:

coords = {"theater": theaters, "items": ["candy","soda"]}

Then when defining you priors, you have to specify dim=“items” and also make sure that you are supplying correct number of parameters. For instance to create two normals centred at a two different means you can do:

pm.Normal("mu_pooled", [candy_mu, soda_mu], [candy_sd, soda_sd], dims="items")

You can either guess these inputs or get them from find_constrained_prior as you did. You should also set your final mu as

mu = pm.Normal('mu', mu=mu_pooled, sigma=sigma_pooled, dims=["theater","items"])

You should also define the sigma for your likelihood for each item since they are on different scales. Or alternatively you can get rid of likelihood sigma and just use Poisson likelihood which I think will be more natural since your data is discrete:

y_hat = pm.Poisson('y_hat', mu=mu[theater_idx,:], observed=y)

For adding the quarter as a predictor, you have to first decide how you want to add that to the model. Like an overall quarterly increase or decrease of sales of everything for instance? Making something more complicated than this will probably run into identifiability issues (such as region x quarter wise effect) if your real data is not larger with less correlated predictors than what you show here.

ps: A good place to look at for good examples of hierarchical models would be chapter 9 of John Kruschke - Doing Bayesian Data Analysis. Fortunately there are pymc adaptations of the code shown there too:

https://nbviewer.org/github/cluhmann/DBDA-python/blob/master/Notebooks/Chapter%209.ipynb

Phil_Clark1 · March 25, 2024, 1:57pm

Thank you so much for the reply! You are correct that the dataset is much larger and there are many distinct theater_ids within each region - so these should not be perfectly correlated.

To clarify, I would like to predict soda_sales with the other variables as predictors. The region, quarter, and theater_id are categories, but candy_sales is continuous (as is the target, soda_sales).

My main sticking point is how I calculate mu given one hierarchical variable (theater_ids) as well as some additional categorical and continuous predictors. Right now I am treating all the categorical variables are hierarchical and just adding everything together in mu, as below. How else could I do it if I just wanted one hierarchical variable theater_id and the rest to be treated as typical predictors in a regression?

region_idx, regions = X.region.factorize()
coords['region'] = regions

type_idx, types = X.type.factorize()
coords['type'] = types

quarter_idx, quarters = X.quarter.factorize()
coords['quarter'] = quarters

with pm.Model(coords=coords) as model:
    
    #Region
    regional_mu_param = pm.Normal('regional_mu_param', mu=200, sigma=50)
    regional_sigma_param = pm.HalfNormal('regional_sigma_param', sigma=50)
    regional_mu = pm.Normal('regional_mu', mu=regional_mu_param, sigma=regional_sigma_param, dims='region')

    #Type
    type_mu_param = pm.Normal('type_mu_param', mu=200, sigma=50)
    type_sigma_param = pm.HalfNormal('type_sigma_param', sigma=50)
    type_mu = pm.Normal('type_mu', mu=type_mu_param, sigma=type_sigma_param, dims='type')

    #Quarter
    quarter_mu_param = pm.Normal('quarter_mu_param', mu=200, sigma=50)
    quarter_sigma_param = pm.HalfNormal('quarter_sigma_param', sigma=50)
    quarter_mu = pm.Normal('quarter_mu', mu=quarter_mu_param, sigma=quarter_sigma_param, dims='quarter')

    #Theater
    theater_mu_param = pm.Normal('theater_mu_param', mu=200, sigma=50)
    theater_sigma_param = pm.HalfNormal('theater_sigma_param', sigma=50) 
    theater_mu = pm.Normal('theater_mu', mu=theater_mu_param, sigma=theater_sigma_param, dims='theater')

    #Candy sales
    candy_beta = pm.Normal('candy_beta', mu= 150, sigma=25)
    
    #Combine to calculate mu
    mu = theater_mu[theater_idx] + regional_mu[region_idx] + type_mu[type_idx]  + quarter_mu[quarter_idx] + candy_beta * X['candy_sales']
    sigma = pm.HalfNormal('sigma', sigma=50)
    
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=y)

iavicenna · March 25, 2024, 3:07pm

If by that you mean you want to use categorical values as your observed covariates for regression and have like slope coefficients that represent the effect size of those then you are looking at regression with categorical variables:

The main idea is to take your categories and code them as some sort of dummy variables (have a look at the examples in the link above) and use them as the covariates. For instance if your categories were male and female then for each entry in your data you would code this by two new numerical covariates: either [1, 0] or [0, 1] first one representing male, the latter female. In pandas you can achieve this via for instance:

pd.get_dummies(df,columns=[ "quarter"], dtype=int)

There are however subtle points you need to pay attention to (they would definitely create problems for likelihood based frequentists statistics, I would assume might also do so for Bayesian unless you have very informed priors). For instance when you do such a conversion, there will always be perfect correlation among some of your new covariates therefore generally people drop one of the categories (which can be achieved by supplying drop_first = True). See:

https://www.learndatasci.com/glossary/dummy-variable-trap/

Interpretation of your results also become in some sense relative because you drop one category. For instance in your case lets say you drop quarter 1, then slopes for quarters 2,3,4 will tell you how much things change in these quarters relative to 1 (and the intercept value in your regression will likely represent some sort of average effect for dropped covariates). These are explained more in detail in Kruschke Chapter 15, more specifically 15.2.4 Nominal Predictors.

Topic		Replies	Views
Hierarchical Model with Multiple Predictors	5	1673	August 15, 2023
Hierarchical model, categorical outcome Questions	0	539	September 15, 2020
Multilevel hierarchical modelling with labels - how to set data for prediction v5 development , modeling	0	181	February 24, 2024
Hierarchical Modeling MMM with Geo-Data in PyMC v5 modeling , hierarchical	2	983	June 11, 2024
Hierarchical regression models for ratings data ( 2 by 2 within-subject design) Questions	3	1741	December 12, 2019

Hierarchical model w/ categorical and continuous predictors

Related topics