A Bayesian Approach to Media Mix Modeling by Michael Johns & Zhenyu Wang

Talk Abstract

This talk describes how we built a Bayesian Media Mix Model of new customer acquisition using PyMC3. We will explain the statistical structure of the model in detail, with special attention to nonlinear functional transformations, discuss some of the technical challenges we tackled when building it in a Bayesian framework, and touch on how we use it in production to guide our marketing strategy.

Talk

Michael Johns

Michael Johns is a data scientist at HelloFresh US. His work focuses on building statistical models for business applications, such as optimizing marketing strategy, customer acquisition forecasting and customer retention.

Zhenyu Wang

Zhenyu Wang is a Senior Business Intelligence Analyst at HelloFresh International. He works on developing and implementing methods to measure the effectiveness of advertising campaigns using analytic and statistical methods.


This is a PyMCon 2020 talk

Learn more about PyMCon!

PyMCon is an asynchronous-first virtual conference for the Bayesian community.

We have posted all the talks here in Discourse on October 24th, one week before the live PyMCon session for everyone to see and discuss at their own pace.

If you are available on October 31st you can register for the live session here!, but if you are not don’t worry, all the talks are already available here on Discourse (keynotes will be posted after the conference) and you can network here on Discourse and on our Zulip.

We value the participation of each member of the PyMC community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events. Everyone taking part in PyMCon activities must abide by the PyMCon Code of Conduct. You can report any incident through this from.

If you want to support PyMCon and the PyMC community but you can’t attend the live session, consider donating to PyMC

Do you have suggestions to improve PyMCon? We have an anonymous suggestion box waiting for you

Have you enjoyed PyMCon? Please fill our PyMCon attendee survey. It is open to both async PyMCon attendees and people taking part in the live session.

12 Likes

Hi Michael and Zhenyu! Really interesting presentation, thank you for sharing! I have a few questions, if you have any time for them:

  1. how has the model performed during covid? What adjustments were made to account for potentially lower CAC (due to less competition, for example)?
  2. why did you use spend instead of impressions? Wouldn’t spend typically be dependent on the price of the ad?
  3. How did you account for any multicollinearity? Outliers?
  4. Why roll spend data up to the weekly level? Wouldn’t you want it to be as granular as possible?
  5. How long did it take the model to run? My experience with PYMC3 is that a model can take a long time to run if there is multicollinearity.

Thank you!
-Spencer

4 Likes

Great presentation. Very clear and interesting!

I also have some questions:

  1. Do you ever do hierarchical models for your MMM?
  2. How do you report on CAC when the CAC varies with the level of spend?
  3. How does the model estimate the CAC for areas past the “max spend” part of the curve?
  4. Does scaling and normalizing the data hurt the interpretability of the model output?
  5. Would it be possible to get a copy of sample code so we can run on our own?

Thank you!

2 Likes

Hi Spencer,

Thanks for the great questions! Here are my responses and thoughts:

  1. The model has done surprisingly well during COVID. Our new customer acquisitions are highly correlated with marketing spend. We reduced much of our marketing after the first major wave in the US and acquisitions dropped also. We did some testing on the benefit of adding some additional control variables to account for changes in demand but none of them meaningfully improved the model. In the end, we didn’t have to make any major changes.

  2. We use marketing spend as the primary predictor in large part to encode the CAC into the model coefficients given that ultimately CAC is the metric we are most interested in. Regressing acquisitions on the spend means the CAC is simply 1/beta. Spend works quite well despite the fact that this is a somewhat coarse measure of marketing activity. That said, we are currently exploring the utility of using more traditional predictors, such as impressions, for some channels where spending is less predictive.

  3. Our primary approach to multicollinearity and outliers is to set moderately informative priors in the tuning and refinement phase of model building. This helps somewhat but doesn’t solve the problem entirely.

  4. We use weekly data based largely on availability; there are few channels where daily spending is available or a meaningful predictor from a business perspective. It is also the case that we plan our marketing budgets largely at the weekly level. When it comes to using the model optimize the marketing mix the, week level of granularity ends up being the most business-relevant level for estimation.

  5. The model can definitely take some time to run; upwards of 90 minutes in some instances. We’ve done some testing and the primary reason for the long run time appears to be the transformations more than multicollinearity. The adstock function in particular is a computationally expensive function to fit. We have recently been doing some testing with variational inference to see if we can speed up the process. Our preliminary findings are not very encouraging, however. The complexity introduced by the saturation and adstock functions generally leads to unstable results that can vary from run to run.

Best,
Mike

3 Likes

Hi Tim,

Thanks for the great questions! My responses and thoughts are below!

  1. We currently do not have any hierarchical versions of the model. Ideally, we would incorporate geographic information hierarchically but generally do not have geo information for enough of our channels to make this possible.

  2. We use fixed effects; our CAC estimate is effectively an average, accounting for other channels and controls. When back calculating, we let the estimate vary as the spend input varies.

  3. CAC estimates past the max spend point are simple extrapolations based on the model estimates. This can be tricky given that we are predicting over values the model has never seen but the primary goal is to extract insights that can inform budget steering.

  4. Scaling does make the raw model output difficult to interpret directly. Typically, we use the coefficients to produce predicted acquisitions on the original scale and produce summary statistics based on those predicted values to help with interpretability.

  5. Yes, I’d be happy to share the sample code in the talk to help you get started if you provide me your contact information.

Best,
Mike

2 Likes

Hi Michael! I would also appreciate it if you can share the code/github used for this modeling exercise. Could you also provide the code used for the optimization? Below is my contact info.

email: fmiraftab@berkeley.edu

Thank you!!!
Farshad

Hi Mike,

If you could send the code to me, too, that would be great!

tbragassa@gmail.com

Thanks.

Hi Mike,

Thank you for the amazing presentation! I would love to try running the model on my own - would you share the sample code in the talk?

My e-mail address is: seoyoungkim@uga.edu

Thanks,
Seoyoung

Hello Michael and Zhenyu,

Thanks for an amazing talk on applying Bayesian methods on a MMM regression model and sharing your experience. Very informative and glad to know that the model appears to be stable even during unusual times like this year.

Would appreciate it if you could share sample code to help those of us interested in getting started trying a Bayesian-approach to MMM -
CHL65@columbia.edu

Thanks!

Hi Mike and Zhenyu,

It was really cool for me to come across your talk right as I’m implementing my own version of a Bayesian MMM based on this Google research paper: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b20467a5c27b86c08cceed56fc72ceadb875184a.pdf

The problem I’m having, and the problem that the Google team had as well, is that the adstock function seems to create too much complexity in the model for it to be estimated well. From your previous replies it seems you are having the same problems as well. Do you have any ideas about how to tackle this problem in the future?

Also, if it isn’t an inconvenience, I’d love to be able to view your sample code. You can reach me at ncg14b@my.fsu.edu

Thanks!

2 Likes

Hi Mike and Zhenye,

Really appreciate you putting out this talk! It’s helped me wrap my head around the nuances of using Bayesian Methods to run a media mix model. I’m trying to apply this to my own MMM assignment and if it’s not inconvenient, I would really appreciate it if you could send me your sample code as well.

My contact info is:

yuristickney9001@gmail.com

Thanks!!

Yuri

Hi Mike and Zhenye,

Thanks for the presentation and it was really good. Very informative.
Can you please share the sample code ? Thanks in advance.
email : nilath@gmail.com

Cheers,
Asanka

Hi Mike and Zhenye,

I was able to reproduce the model code from the presentation and got it to run with my data. However, I am fairly unfamiliar with pymc3 and my jupyter notebook keeps crashing when I try to run samples on the model after creating it. I’ve log transformed all of my continuous predictors along with my target variable to speed things up but for some reason I’m noticing many errors pop up when I call pm.sample() on the model resulting in crashes. Do you have any experience with this or know any resolutions?

Thanks!

Yuri

You might want to open a separate topic here on the discourse with your code and the error messages you are getting (the minimal example that still gives you problems would be the best).

I was able to get sample to run, however I’m getting some whacky results. I’ve started my own topic, but if I am ever able to take a look at the sample code, it would help me tremendously on interpreting my model!

Thanks

Hi Mike and Zhenye,
In my industry , I am implementing Frequentist Marketing Mix Models.
Could you please share your example code to give it a try with PyMC3.
My contact email: matteo.malosetti@gmail.com

Hi Mike and Zhenyu,

Amazing presentation, really great insights. I’d like to ask a couple of things:

  1. Have you incorporated the external factors like competitors activities and Halo Effect or Cannibalism Effect on sales?
  2. Based on your experience how well do you feel model is able to explain based on just the spend as an input but without seeing the activities happened based on that spend. Same spend on two different media platforms may have different effect, how does a model learn that?

Could you share sample code so that I can try it on my data as well.
Email : ayush.dhanuka1994@gmail.com

Thanks,
Ayush Dhanuka

Hi Michael and Zhenyu.

Thanks for sharing your presentation. Would it be possible to share your sample code with me as well.

msnow@barkbox.com

Thanks,
Michoel

Hi Mike and Zhenye,
In my industry , I am implementing Frequentist Marketing Mix Models so far. I’ve listened to your presentation and found it very interesting.
Could you please share your example code to give it a try with Bayesian Marketing Mix Models.
Thank you so much,
My contact email: matteo.malosetti@gmail.com

Hey Michael and Zhenyu,
Loved the presentation, looking to follow a similar approach for one of my projects! Would it be possible to get some of the sample code? My contact info is ksengupta99@berkeley.edu.