 # Help with Media Mix Modeling and pymc3

I’m creating a Media Mix Model using pymc3 based on this posting:

A Bayesian Approach to Media Mix Modeling by Michael Johns & Zhenyu Wang - PyMCon / PyMCon2020 - PyMC Discourse

I’m pretty unexperienced with pymc3 and bayesian modeling in general, although I’ve taken my fair share of machine learning courses in college, generally all of that stuff seemed very convoluted and hard to apply in practice. I did learn about bayesian networks so I do understand the gist of how it works, and after doing weeks of research on pymc3 and bayesian modeling I am understanding how bayesian modeling works more and more…

However, I’ve got some model output that is very difficult for me to decipher. I’m pretty clueless on how to interpret some of the distributions being assigned to my coefficients and how it all ties together/whether my model is complete garbage or if I’m on the right track.

In my model, I have four channels for media spend, two control variables for internal and competitive pricing, and some categorical variables for seasonality such as month and year, along with a dummy variable for COVID severity. The target variable in this case is reservations, and not revenue and I have roughly four years of weekly data, so just over 200 records of sales data. Two of the media channels only have 3 years of data whereas the other two have about 4 years of data.

I’m not using any sort of adstock functions on my channels just to keep things simple but I am running each one through a saturation function as described in the article.

for the media spend channels, I’m using a Gamma distribution with an alpha of 3 and a beta of 1 for the Mu and using a HalfNormal with a standard deviation of 5 for the beta, and feeding them into a logistic function. Here is the sample code:

``````xx = df_in[channel_name].values

channel_b = HalfNormal(f'beta_{channel_name}', sd=5)

#logistic reach curve
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(xx, channel_mu) * channel_b)
``````

I’m treating the internal and competitor pricing as continuous control variables and use a Normal distribution for the beta with a standard deviation of 5. (I believe he used .25 for the sd in his presentation) Here is the code:

``````x = df_in[channel_name].values

control_beta = Normal(f'beta_{channel_name}', sd=5)
channel_contrib = control_beta * x
response_mean.append(channel_contrib)
``````

Finally I’m treating my Month and COVID variables as categorical variables. I’m not sure exactly how this portion of the code works but I copied from his presentation:

``````for var_name in index_vars:
x = df_in[var_name].values

shape_v = len(set(x))

ind_beta = Normal('beta_' + var_name, sd=.5, shape=shape_v)

channel_contrib = ind_beta[x]
response_mean.append(channel_contrib)
``````

I set the sigma as an exponential as follows:

``````sigma = Exponential('sigma', 10)
``````

Lastly I get the final output as this:

``````likelihood = Normal(outcome, mu=sum(response_mean), sigma=sigma, observed=df_in[outcome].values)
``````

response mean is a list that I instatiate at the beginning of the model which looks like this:

``````with Model() as model:
response_mean = []
``````

Each of the channels are run inside of a loop, iterating through a list with the keys indexing my main dataframe. This code can be found in the video presentation. The logistic function is also defined in the presentation.

After creating the model, I use sample in this way to get it to fit (if I’m not mistaken that’s what pm.sample() does?):

``````with model:
trace = pm.sample(10000, tune=1000, chains=2, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)
``````

I’ve attached screenshots of the plot trace and plot_posterior:

Finally I try to see how close the posterior predictive is compared to the actual target by running this code:

``````with model:
ppc = pm.sample_posterior_predictive(trace)
``````

az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model)); I’ve also outputted the summary metrics with:

``````az.summary(trace)
``````

Any help would be much appreciated to help guide me in the right direction on this. . .

If anyone actually took their time to read through and follow along with this, thank you!!!

I have no domain expertise in media mix modeling, so maybe @Michael_Johns or @zwang can help there.

But here are two small tips from your code/screenshots:

1. You can pass `compact=True`, or the `compact_prop` kwarg to `arviz.plot_trace` to combine some variables in the traceplot (e.g. the months). See arviz.plot_trace — ArviZ dev documentation
2. The NUTS sampler has some trouble exploring your posterior distribution: It diverges (slings off into infinity) in some places of the parameter space. This can be caused by many things, but is generally a sign that the specification of the model is odd. You specified `sigma ~ Exponential`, which could be the reason. Standard deviations are hard to sample in general, and high probability close to 0 is always a problem.

Another way to diagnose divergences is through `arviz.plot_pair(..., divergences=True)` to see if they occur in some special parameter combination.

The nuclear option if the above does not help is `pm.sample(..., target_accept=0.9)`.

With the posterior predictive I lack the domain expertise as I said…

Have fun diving into the Bayesian modeling world!

Thanks for the pointers, removing the sigma seems to have made the divergences go away!

However, yes… I’m not sure how to interpret the posterior prediction.

Ultimately, I have two goals:

1. to create a new column in my table for the predicted value of y. This would essentially be my mean prediction value as depicted in this image:

I got this from this article which I believe was written by Michael_Johns and Zhenyu Wang

Bayesian Media Mix Modeling using PyMC3, for Fun and Profit | by Luca Fiaschi | HelloTech (hellofresh.com)

1. to create a formula not unlike what a linear regression creates, where I have my parameters and can plug in new values of predictors to calculate changes in the predicted value. I believe this is what is discussed in the last section of the presentation and the article above, in calculating ROAS (Return On Ad Spend).

I’m not exactly sure what my next steps are for this model, but I’m really just trying to understand the nuances of the distributions chosen as well as when the beta and mu coefficients are what they are and what they mean.

Below I’ve attached my code again along with the resulting charts:

``````import numpy as np
import pandas as pd
import pymc3 as pm
from pymc3 import *
import arviz as az
import theano
import theano.tensor as tt

df_in = pd.read_csv('Modeling Table for Python.csv')

SEED = 58

#print(df_in.keys())
#print(df_in.shape)
delay_channels = ''#['Display']
non_lin_channels = ['Metasearch', 'Paid_Search', 'CTC', 'Display']
index_vars = ['MonthID']#, 'YearID']
outcome = 'Log Bookings'

'''
:param alpha: rate of decay (float)
:param L: Length of time carryover effects can have an impact (int)
:normalize: Boolean
:return: transformed spend vector
'''
w = tt.as_tensor_variable([tt.power(alpha, i) for i in range(L)])
xx = tt.stack([tt.concatenate([tt.zeros(i), x_t[:x_t.shape - i]]) for i in range(L)])

if not normalize:
y= tt.dot(w, xx)

else:
y = tt.dot(w / tt.sum(w), xx)
return y

def logistic_function(x_t, mu=0.1): #Saturation Function
'''
:param x_t: marketing spend vector (float)
:param mu: half-saturation point (float)
:return: transformed spend vector
'''
return (1 - np.exp(-mu * x_t)) / (1 + np.exp(-mu * x_t))

with Model() as model:
response_mean = []

# channels that can have DECAY and SATURATION effects
for channel_name in delay_channels:
print(channel_name)
xx = df_in[channel_name].values
#print(xx)
print(xx.shape)

channel_b = HalfNormal(f'beta_{channel_name}', sd=5) # Keep Coefficients Positive

alpha = Beta(f'alpha_{channel_name}', alpha=3, beta=3)
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)

# channels that can have SATURATION effects only
for channel_name in non_lin_channels:
xx = df_in[channel_name].values

channel_b = HalfNormal(f'beta_{channel_name}', sd=5)

#logistic reach curve
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(xx, channel_mu) * channel_b)

# Continuous control variabls
if control_vars:
for channel_name in control_vars:
x = df_in[channel_name].values

control_beta = Normal(f'beta_{channel_name}', sd=.25)
channel_contrib = control_beta * x
response_mean.append(channel_contrib)

# Categorical control variables
if index_vars:
for var_name in index_vars:
x = df_in[var_name].values

shape_v = len(set(x))

ind_beta = Normal('beta_' + var_name, sd=.5, shape=shape_v)

channel_contrib = ind_beta[x]
response_mean.append(channel_contrib)

# Noise level
#sigma = Exponential('sigma', 10)

# Define likelihood
likelihood = Normal(outcome, mu=sum(response_mean), observed=df_in[outcome].values)

with model:
trace = pm.sample(10000, tune=1000, chains=2, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)
``````

``````az.plot_trace(trace, compact=True)
``````

``````az.summary(trace)
``````

``````with model:
ppc = pm.sample_posterior_predictive(trace, random_seed=SEED)

az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));
`````` Again, thanks for the help!

Hi, I am building the same Media mix model, but it gives me the following error (summarized):

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `beta_Seasonality3`.ravel() is zero.
The derivative of RV `beta_Seasonality3`.ravel() is zero.
The derivative of RV `mu_Banners_log__`.ravel() is zero.
The derivative of RV `mu_Radio_log__`.ravel() is zero.
The derivative of RV `alpha_Radio_logodds__`.ravel() is zero.
The derivative of RV `mu_TV_log__`.ravel() is zero.
The derivative of RV `alpha_TV_logodds__`.ravel() is zero.
“”"

The likelihood is built with pm.Normal.

In the dataset I have float 0.0 (zeroes) values, because there are some weeks that there is zero spend,
plus the seasonality values are hot encoded as floats 0.0 and 1.0.

Is there a different pm.model than the normal that can manage dataset with zero values?
Thanks,

Marcello