Bayesian Hierarchical Modeling - report coefficients for multiple levels

Hi,

first of all, thank you very much for this awesome library. It is really great and I enjoy using it. I am still pretty new to Bayesian Modelling so I hope my questions are not too stupid.
I have some panel data about monthly sales (dependent variable) in 20 cities for 5 years as well as marketing and other activities done in those (independent variables/control variables). For each city I also have the corresponding county and state information as an additional variable. The sales variable represents the number of packages of a specific product sold.
I would like to use this data for marketing mix modelling and based on my research hierarchical bayesian regression is the way to go. So I started implementing the model and currently my model code looks as follows:

# Hyperprios
mu_0 = pm.Normal('mu_intercept', mu=5, sigma=1)

mu_1 = pm.HalfCauchy('mu_act1', beta=0.5)

mu_2 = pm.HalfCauchy('mu_act2', beta=0.5)

mu_3 = pm.HalfCauchy('mu_act3', beta=0.5)

mu_4 = pm.HalfCauchy('mu_act4', beta=0.5)

mu_23 = pm.HalfCauchy('mu_sales_lag_1', beta=1)

mu_26 = pm.HalfCauchy('mu_ext', beta=1)

mu_27 = pm.Normal('mu_market', mu=5, sigma=1)


# Priors, the coefficients are expected to be rather small, that's why the HalfNormal distribution was chosen
beta_0 = pm.HalfNormal('intercept', sigma=mu_0, shape=n_cities)
# marketing activities
beta_1 = pm.HalfNormal('act1', sigma= mu_1, shape=n_cities)
beta_2 = pm.HalfNormal('act2', sigma=mu_2, shape=n_cities)
beta_3 = pm.HalfNormal('act3', sigma=mu_3, shape=n_cities)
beta_4 = pm.HalfNormal('act4', sigma=mu_4, shape=n_cities)
# control variables
beta_23 = pm.HalfNormal('sales_lag_1', sigma=mu_23, shape=n_cities)
beta_26 = pm.HalfNormal('ext', sigma=mu_26, shape=n_cities)
beta_27 = pm.HalfNormal('market', sigma=mu_27, shape=n_cities)

# data & hierarchical index of city level 
mu = beta_0[city_idx] + beta_1[city_idx] * X['act1']+ beta_2[city_idx] * X['act2'] + beta_3[city_idx] * X['act3']+ beta_4[city_idx] * X['act4'] + beta_23[city_idx] * X['sales_lag_1'] + beta_26[city_idx] * X['ext'] + beta_27[city_idx] * X['market']   

Y_obs = pm.Poisson('Y_obs', mu=mu, observed=y)

The model works and I get the coefficients for all cities in the data, but I would like to report the coefficients on different hierarchical levels as well, e.g. per county and state, and also for the whole country. Could you point me into the right direction on how to best do that? Also, how could I get the coefficients for the national level?
Also, of course, any feedback on the code above is highly appreciated :slight_smile:

Thank you very much in advance!
Ben

1 Like

When you ask for “coefficients per county” and “per state”, do you mean you want to add those variables as levels in a hierarchical model that resembles (at it’s lowest level) the model you currently have? If so, I would encourage you to check out the hierachical regression examples here and here. I also like the baseball example from Kruschke’s DBDA book.

But to quickly sketch out how you might adapt your model, you might try something along these lines:

mu_0_national = pm.Normal('mu_0', mu=5, sigma=1)
mu_1_national = pm.HalfCauchy('mu_1', beta=0.5)

beta_0_state = pm.HalfNormal('beta_0_state', sigma=mu_0_national, shape=n_states)
beta_1_state = pm.HalfNormal('beta_1_state', sigma=mu_1_national, shape=n_states)

beta_0_city = pm.HalfNormal('beta_0_city', sigma=beta_0_state[city_to_state_mapping], shape=n_cities)
beta_1_city = pm.HalfNormal('beta_1_city', sigma=beta_1_state[city_to_state_mapping], shape=n_cities)

mu = beta_0_city[city_idx] + (beta_1_city[city_idx] * X['act1'])

I have continued with the half normal distributions you used. Your comment suggests that these priors are being used because the coefficients are expected to be small. But a half normal constraints your parameters to be positive, not small. Just an observation.

Also, please note that you should be very thoughtful about modeling hierarchically-structured locality data like this (cities, states, country). For example, only modeling states for which you happen to have city-level data will have strong (likely bad) consequences for inferences at the other levels of the model. For example, if you have data from 20 cites, all of which are located in a single state, your “national” level parameter should reflect lots of uncertainty (because you have no idea what the other states are like) and should not necessarily expected to look much like that one arbitrary state. Once you have played around quite a bit, I might suggest looking into Mister P (aka, multilevel regression with poststratification, detailed in Gelman’s BDA book).

2 Likes

Thank you so much for your quick answer.

Yes, that is correct. I basically would like to report the effect of the activities I mentioned on different hierarchical levels. So for instance I would like to understand the effect of those activities per cities, but also per counties, states, etc. Thank you very much for the sketch and also for the references! I had a look at the examples before for a quite a bit but could not get to understand how to do it. Your example helps to clarify things a lot, thank you!

Thank you also for the hint with regards to the priors (you are absolutely right, that was not very concise, but exactly what I meant) and the data structure, that makes a lot of sense. I will see whether I can fix that and test everything out and come back with an update. Thank you very much!

1 Like

I could manage to improve my data with more cities from different states which I hope will make the model more meaningful. Following your sketch I could also manage to get the model set up and running and the results seem ok for now - thank you very much! If anyone else is looking for how to create the indices, this link helped me to get it right.

Some follow-up question came up in my mind, though: how could I make predictions based on national/state/city level? Following this reference, I would assume that for the city level, let’s assume for the city with index 7, I could use something as follows:

mu_0_national = pm.Normal('mu_0', mu=5, sigma=1)
mu_1_national = pm.HalfCauchy('mu_1', beta=0.5)

beta_0_state = pm.HalfNormal('beta_0_state', sigma=mu_0_national, shape=n_states)
beta_1_state = pm.HalfNormal('beta_1_state', sigma=mu_1_national, shape=n_states)

beta_0_city = pm.HalfNormal('beta_0_city', sigma=beta_0_state[city_to_state_mapping], shape=n_cities)
beta_1_city = pm.HalfNormal('beta_1_city', sigma=beta_1_state[city_to_state_mapping], shape=n_cities)

mu = beta_0_city[city_idx] + (beta_1_city[city_idx] * X['act1'])

Y_obs = pm.Poisson('Y_obs', mu=mu, observed=y)

pred = pm.Poisson('Y_pred_city', mu=beta_0_city[7] + (beta_1_city[7] * X['act1']), observed=y)

For a state, I would assume I could just use mu= beta_0_state[0] + (beta_1_state[0] * X[‘act1’]) in the last line, and for the national level I would just use mu= beta_0_national + (beta_1_national * X[‘act1’]). Is this correct/does this make sense? Is there also a way to do it based on the built model?

Thank you very much again!

1 Like

This line will not give you what you predictions per se, because you have declared that variable to be observed (and is redundant with the preceding line):

pred = pm.Poisson('Y_pred_city', mu=beta_0_city[7] + (beta_1_city[7] * X['act1']), observed=y)`

It’s likely that you will want to interrogate the trace object generated during sampling to generate your predictions, though the details will depend on exactly what you are trying to predict (new data from city 7? counterfactual conditions in which city 7 had different characteristics?). To just get the predicted mean for city 7, for example, you can just grab sampled values of beta_0_city[7] and beta_1_city[7] and plug them into scipy.stats.poisson. We had some related discussion here. You might also take a look at the documentation for pymc3.model.set_data(). That’s sometimes cleaner/easier depending on your needs (e.g., when trying to predict out of sample).

Thank you so much for the clarification, I really appreciate it. The objective is to predict for new data on all the cities to understand the expected sales, but also to aggregate those up on state and national level. My current approach is to do the predictions on city-level and sum those up per state/nationally. Do you think this a valid approach and/or is there a better way? I have the feeling I am missing something here as I also have the state-/national-level coefficients.

I also really like the set_data() approach, this is really very neat - thank you very much for sharing! Just to make sure I understood it correctly: I can define the city_index via set_data and use this in the sample_posterior_predictive function to predict for a specific city, right?

Thank you very much!

This would very much depend on exactly what you want to predict. As just a bit of an illustrative example, if you are only ever concerned with a fixed set of cities, all of which you have past data from, then the state and national levels of the model will provide some shrinkage/regularization, but may not be useful in and of themselves. On the other hand, if you are trying to predict data for a city that you have no past data for, but which is located in a state where you do have data (e.g., from other cities), then the state-level parameters will be more directly useful.

Again, it’s hard to know without getting deeper into your data and model, but I would assume that your data would be the contents of X.

Once you’re dealing with hierarchical models, prediction becomes very flexible and thus very powerful. But that also means that there is no longer a single obvious way to extract “the predictions” because there are many different kinds of predictions one can extract.

Thanks a lot again for your quick and very helpful response. I see what you are referring to, apologies for the unconcise phrasing of my question. I hope the following helps:

My main objective is to predict sales on national level (meaning all cities summed up). My data is on city level and I am only interested in this fixed set of cities which is in my dataset - so bascially the first case you mentioned applies. If the results are valid on city level, also predicting the sales on city level would be desired, but the main objective for now is to get good results for the national level. So if I understood you correctly, for this setting you would recommend to do the predictions for each city and then sum those up, correct?

Thank you so much, I really appreciate your help!

Yes, that sounds reasonable. If you are only concerned about “observed” cities and states, then many of your potential choices are simplified (because the national-level parameters will be strongly related to the state- and city-level parameters). As I said previously, the use of a hierarchical model in these sorts of amounts to injecting a bit of regularization and will yield a bit of shrinkage (allowing inferences about each individual city to be partially informed by the data from all other cities). Sounds like you are on the right track.