Hi! I’m working on a hierarchical MMM (marketing mix) model where I have sub-channels within media channels. For example:
Display channel has 3 publishers (e.g., Google, Exchange A, etc.)
Social channel has 4 publishers (e.g., Pinterest, Instagram, etc.)
Linear channel has 5 networks (e.g., ABC, CNN, etc.)
I’m struggling with how to set up the model and hierarchical structure. Specifically, I want to:
Allow each subchannel (e.g., Google, ABC) to have its own media effectiveness (beta)
Handle the fact that the number of subchannels differs by channel, so the hierarchy isn’t balanced
I’m using PyMC for model implementation, and I’m unclear on how to handle the fact that the hierarchy isn’t balanced and that the publishers/networks are specific to the channels. Do I need to build channel-specific index mappings? If yes, how do I apply it to the model?
Most code examples on the web for hierarchical MMM are geo-focused (e.g., country-level or region-level groupings), but my use case is structural within media channels — for instance, different types of publishers or networks within each channel. I haven’t found examples that address how to handle nested subgroups that differ in structure and depth per parent channel.
Any code examples or advice would be incredibly helpful!
I don’t think the article above addresses your question - it seems to be geo-focused.
The way I would sort of conceptualise this is that in MMMs, often the hierarchy can be defined on the dependent variable for example sales in region A, from customers aged 20-25. That particular cross section has a level of media exposure that is unique to them. In these cases, it’s relatively straightforward to set up the hierarchies because they are balanced. Usually x-arrays are good option here.
But you want to be able to have media to have sub-channels, and these sub channels are not balanced.
Do you also have hierarchies at the dependent variable level? Or are you just looking at total sales in each week?
My first thought, and it might be an ok place to start depending on the maximum number of subchannels that exist, would be to use a generic labelling for sub channels. For example, subchannel 1, subchannel 2, …, subchannel n, where n is the maximum number of subchannels across the higher channels. In the example you gave, n would be 5.
Then, for the channels with less than n subchannels, you can just 0 out the data so that the x-array is balanced. That data will never contribute to the dependent variable so it might as well not be there, and the coefficient applied to that data is meaningless. The issue with this might be some wasted compute power.
You would just have a mapping then for each larger channel which defines what subchannel 1 for Display is (Google).
I’m not sure if this is best practice, but you can also generate an index for each channel, i.e., you have idx: Google, with values [Display, subchannel 2, etc], and then a separate index for the 2nd channel.
Then you’d need to either manually add the channels or loop over them in the actual model. A python loop would get unrolled, but assuming you have a smallish number of channels that probably be fine.