Help with Media Mix Modeling and pymc3

I’m creating a Media Mix Model using pymc3 based on this posting:

A Bayesian Approach to Media Mix Modeling by Michael Johns & Zhenyu Wang - PyMCon / PyMCon2020 - PyMC Discourse

I’m pretty unexperienced with pymc3 and bayesian modeling in general, although I’ve taken my fair share of machine learning courses in college, generally all of that stuff seemed very convoluted and hard to apply in practice. I did learn about bayesian networks so I do understand the gist of how it works, and after doing weeks of research on pymc3 and bayesian modeling I am understanding how bayesian modeling works more and more…

However, I’ve got some model output that is very difficult for me to decipher. I’m pretty clueless on how to interpret some of the distributions being assigned to my coefficients and how it all ties together/whether my model is complete garbage or if I’m on the right track.

In my model, I have four channels for media spend, two control variables for internal and competitive pricing, and some categorical variables for seasonality such as month and year, along with a dummy variable for COVID severity. The target variable in this case is reservations, and not revenue and I have roughly four years of weekly data, so just over 200 records of sales data. Two of the media channels only have 3 years of data whereas the other two have about 4 years of data.

I’m not using any sort of adstock functions on my channels just to keep things simple but I am running each one through a saturation function as described in the article.

for the media spend channels, I’m using a Gamma distribution with an alpha of 3 and a beta of 1 for the Mu and using a HalfNormal with a standard deviation of 5 for the beta, and feeding them into a logistic function. Here is the sample code:

xx = df_in[channel_name].values
    
print(f'Adding Non-Linear Logistic Channel: {channel_name}')
channel_b = HalfNormal(f'beta_{channel_name}', sd=5)
    
#logistic reach curve
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(xx, channel_mu) * channel_b)

I’m treating the internal and competitor pricing as continuous control variables and use a Normal distribution for the beta with a standard deviation of 5. (I believe he used .25 for the sd in his presentation) Here is the code:

x = df_in[channel_name].values
        
print(f'Adding Control: {channel_name}')
        
control_beta = Normal(f'beta_{channel_name}', sd=5)
channel_contrib = control_beta * x
response_mean.append(channel_contrib)

Finally I’m treating my Month and COVID variables as categorical variables. I’m not sure exactly how this portion of the code works but I copied from his presentation:

for var_name in index_vars:
     x = df_in[var_name].values

     shape_v = len(set(x))

     print(f'Adding Index Variable: {var_name}')

     ind_beta = Normal('beta_' + var_name, sd=.5, shape=shape_v)

     channel_contrib = ind_beta[x]
     response_mean.append(channel_contrib)

I set the sigma as an exponential as follows:

sigma = Exponential('sigma', 10)

Lastly I get the final output as this:

likelihood = Normal(outcome, mu=sum(response_mean), sigma=sigma, observed=df_in[outcome].values)

response mean is a list that I instatiate at the beginning of the model which looks like this:

with Model() as model:
     response_mean = []

Each of the channels are run inside of a loop, iterating through a list with the keys indexing my main dataframe. This code can be found in the video presentation. The logistic function is also defined in the presentation.

After creating the model, I use sample in this way to get it to fit (if I’m not mistaken that’s what pm.sample() does?):

with model:
     trace = pm.sample(10000, tune=1000, chains=2, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)

I’ve attached screenshots of the plot trace and plot_posterior:

Finally I try to see how close the posterior predictive is compared to the actual target by running this code:

with model:
     ppc = pm.sample_posterior_predictive(trace)

az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));

Plot_PPC

I’ve also outputted the summary metrics with:

az.summary(trace)

Any help would be much appreciated to help guide me in the right direction on this. . .

If anyone actually took their time to read through and follow along with this, thank you!!!

I have no domain expertise in media mix modeling, so maybe @Michael_Johns or @zwang can help there.

But here are two small tips from your code/screenshots:

  1. You can pass compact=True, or the compact_prop kwarg to arviz.plot_trace to combine some variables in the traceplot (e.g. the months). See arviz.plot_trace — ArviZ dev documentation
  2. The NUTS sampler has some trouble exploring your posterior distribution: It diverges (slings off into infinity) in some places of the parameter space. This can be caused by many things, but is generally a sign that the specification of the model is odd. You specified sigma ~ Exponential, which could be the reason. Standard deviations are hard to sample in general, and high probability close to 0 is always a problem.

Another way to diagnose divergences is through arviz.plot_pair(..., divergences=True) to see if they occur in some special parameter combination.

The nuclear option if the above does not help is pm.sample(..., target_accept=0.9).

With the posterior predictive I lack the domain expertise as I said…

Have fun diving into the Bayesian modeling world!

Thanks for the pointers, removing the sigma seems to have made the divergences go away!

However, yes… I’m not sure how to interpret the posterior prediction.

Ultimately, I have two goals:

  1. to create a new column in my table for the predicted value of y. This would essentially be my mean prediction value as depicted in this image:

I got this from this article which I believe was written by Michael_Johns and Zhenyu Wang

Bayesian Media Mix Modeling using PyMC3, for Fun and Profit | by Luca Fiaschi | HelloTech (hellofresh.com)

  1. to create a formula not unlike what a linear regression creates, where I have my parameters and can plug in new values of predictors to calculate changes in the predicted value. I believe this is what is discussed in the last section of the presentation and the article above, in calculating ROAS (Return On Ad Spend).

I’m not exactly sure what my next steps are for this model, but I’m really just trying to understand the nuances of the distributions chosen as well as when the beta and mu coefficients are what they are and what they mean.

Below I’ve attached my code again along with the resulting charts:

import numpy as np
import pandas as pd
import pymc3 as pm
from pymc3 import *
import arviz as az
import theano
import theano.tensor as tt

df_in = pd.read_csv('Modeling Table for Python.csv')

SEED = 58

#print(df_in.keys())
#print(df_in.shape)
delay_channels = ''#['Display']
non_lin_channels = ['Metasearch', 'Paid_Search', 'CTC', 'Display']
control_vars = ['Log ADR', 'Log Compset ADR'] # , 'COVID-19_ID']
index_vars = ['MonthID']#, 'YearID']
outcome = 'Log Bookings'


def geometric_adstock_tt(x_t, alpha=0, L=12, normalize=True):
    '''
    :param alpha: rate of decay (float)
    :param L: Length of time carryover effects can have an impact (int)
    :normalize: Boolean
    :return: transformed spend vector
    '''
    w = tt.as_tensor_variable([tt.power(alpha, i) for i in range(L)])     
    xx = tt.stack([tt.concatenate([tt.zeros(i), x_t[:x_t.shape[0] - i]]) for i in range(L)])

    if not normalize:
        y= tt.dot(w, xx)
    
    else:
        y = tt.dot(w / tt.sum(w), xx)
    return y


 def logistic_function(x_t, mu=0.1): #Saturation Function
    '''
    :param x_t: marketing spend vector (float)
    :param mu: half-saturation point (float)
    :return: transformed spend vector
    '''
    return (1 - np.exp(-mu * x_t)) / (1 + np.exp(-mu * x_t))


with Model() as model:
    response_mean = []

    # channels that can have DECAY and SATURATION effects
    for channel_name in delay_channels:
        print(channel_name)
        xx = df_in[channel_name].values
        #print(xx)
        print(xx.shape)
    
        print(f'Adding Delayed Channels: {channel_name}')
        channel_b = HalfNormal(f'beta_{channel_name}', sd=5) # Keep Coefficients Positive
    
        alpha = Beta(f'alpha_{channel_name}', alpha=3, beta=3)
        channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
        response_mean.append(logistic_function(geometric_adstock_tt(xx,alpha),channel_mu) * channel_b)

    # channels that can have SATURATION effects only
    for channel_name in non_lin_channels:
        xx = df_in[channel_name].values
    
        print(f'Adding Non-Linear Logistic Channel: {channel_name}')
        channel_b = HalfNormal(f'beta_{channel_name}', sd=5)
    
        #logistic reach curve
        channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
        response_mean.append(logistic_function(xx, channel_mu) * channel_b)
    
    # Continuous control variabls
    if control_vars:
        for channel_name in control_vars:
            x = df_in[channel_name].values
        
            print(f'Adding Control: {channel_name}')
        
            control_beta = Normal(f'beta_{channel_name}', sd=.25)
            channel_contrib = control_beta * x
            response_mean.append(channel_contrib)
        
    # Categorical control variables
    if index_vars:
        for var_name in index_vars:
            x = df_in[var_name].values

            shape_v = len(set(x))

            print(f'Adding Index Variable: {var_name}')

            ind_beta = Normal('beta_' + var_name, sd=.5, shape=shape_v)

            channel_contrib = ind_beta[x]
            response_mean.append(channel_contrib)
        
    # Noise level
    #sigma = Exponential('sigma', 10)

    # Define likelihood
    likelihood = Normal(outcome, mu=sum(response_mean), observed=df_in[outcome].values)


with model:
    trace = pm.sample(10000, tune=1000, chains=2, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)

az.plot_trace(trace, compact=True)


az.summary(trace)

with model:
    ppc = pm.sample_posterior_predictive(trace, random_seed=SEED)

az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));

Plot_PPC

Again, thanks for the help!

Hi, I am building the same Media mix model, but it gives me the following error (summarized):

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV beta_Seasonality3.ravel()[0] is zero.
The derivative of RV beta_Seasonality3.ravel()[1] is zero.
The derivative of RV mu_Banners_log__.ravel()[0] is zero.
The derivative of RV mu_Radio_log__.ravel()[0] is zero.
The derivative of RV alpha_Radio_logodds__.ravel()[0] is zero.
The derivative of RV mu_TV_log__.ravel()[0] is zero.
The derivative of RV alpha_TV_logodds__.ravel()[0] is zero.
“”"

The likelihood is built with pm.Normal.

In the dataset I have float 0.0 (zeroes) values, because there are some weeks that there is zero spend,
plus the seasonality values are hot encoded as floats 0.0 and 1.0.

Is there a different pm.model than the normal that can manage dataset with zero values?
Thanks,

Marcello