I’m creating a Media Mix Model using pymc3 based on this posting:
I’m pretty unexperienced with pymc3 and bayesian modeling in general, although I’ve taken my fair share of machine learning courses in college, generally all of that stuff seemed very convoluted and hard to apply in practice. I did learn about bayesian networks so I do understand the gist of how it works, and after doing weeks of research on pymc3 and bayesian modeling I am understanding how bayesian modeling works more and more…
However, I’ve got some model output that is very difficult for me to decipher. I’m pretty clueless on how to interpret some of the distributions being assigned to my coefficients and how it all ties together/whether my model is complete garbage or if I’m on the right track.
In my model, I have four channels for media spend, two control variables for internal and competitive pricing, and some categorical variables for seasonality such as month and year, along with a dummy variable for COVID severity. The target variable in this case is reservations, and not revenue and I have roughly four years of weekly data, so just over 200 records of sales data. Two of the media channels only have 3 years of data whereas the other two have about 4 years of data.
I’m not using any sort of adstock functions on my channels just to keep things simple but I am running each one through a saturation function as described in the article.
for the media spend channels, I’m using a Gamma distribution with an alpha of 3 and a beta of 1 for the Mu and using a HalfNormal with a standard deviation of 5 for the beta, and feeding them into a logistic function. Here is the sample code:
xx = df_in[channel_name].values
print(f'Adding Non-Linear Logistic Channel: {channel_name}')
channel_b = HalfNormal(f'beta_{channel_name}', sd=5)
#logistic reach curve
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(xx, channel_mu) * channel_b)
I’m treating the internal and competitor pricing as continuous control variables and use a Normal distribution for the beta with a standard deviation of 5. (I believe he used .25 for the sd in his presentation) Here is the code:
x = df_in[channel_name].values
print(f'Adding Control: {channel_name}')
control_beta = Normal(f'beta_{channel_name}', sd=5)
channel_contrib = control_beta * x
response_mean.append(channel_contrib)
Finally I’m treating my Month and COVID variables as categorical variables. I’m not sure exactly how this portion of the code works but I copied from his presentation:
for var_name in index_vars:
x = df_in[var_name].values
shape_v = len(set(x))
print(f'Adding Index Variable: {var_name}')
ind_beta = Normal('beta_' + var_name, sd=.5, shape=shape_v)
channel_contrib = ind_beta[x]
response_mean.append(channel_contrib)
I set the sigma as an exponential as follows:
sigma = Exponential('sigma', 10)
Lastly I get the final output as this:
likelihood = Normal(outcome, mu=sum(response_mean), sigma=sigma, observed=df_in[outcome].values)
response mean is a list that I instatiate at the beginning of the model which looks like this:
with Model() as model:
response_mean = []
Each of the channels are run inside of a loop, iterating through a list with the keys indexing my main dataframe. This code can be found in the video presentation. The logistic function is also defined in the presentation.
After creating the model, I use sample in this way to get it to fit (if I’m not mistaken that’s what pm.sample() does?):
with model:
trace = pm.sample(10000, tune=1000, chains=2, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)
I’ve attached screenshots of the plot trace and plot_posterior:
Finally I try to see how close the posterior predictive is compared to the actual target by running this code:
with model:
ppc = pm.sample_posterior_predictive(trace)
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));
I’ve also outputted the summary metrics with:
az.summary(trace)
Any help would be much appreciated to help guide me in the right direction on this. . .
If anyone actually took their time to read through and follow along with this, thank you!!!