This might be a simple question but I’m having difficulty coming out with a single prediction from my model. Generally a model will have the target variable and a set of predictor variables, and generates the predicted target. In the case of say a linear regression, the algorithm will essentially be optimizing the regression equation coefficients to minimize the error between the predicted value and target. Whereas, in a Bayesian statistical model the coefficients are not single points but rather distributions and thus the predicted value is also a distribution.
Ultimately, I understand that within the distributions converged upon for all of the coefficients and predicted values, there is a mean and standard deviation. It would be best for the mean of the predicted distribution to be close to the target since that would indicate that the model was able to more or less predict the target, and it would be even better for the standard deviation to be as small as possible since that would indicate a narrower distribution, potentially increasing the accuracy. Having a huge standard deviation on the predicted distributions wouldn’t exactly be ideal but it’s more important to have the mean of the predicted distribution be close to the target than having a small standard deviation.
I’m creating a fairly complex Media Mix Model combing Halfnormal, Normal and Gamma distributions for the priors and I’m able to fit my model using:
pm.sample(10000, tune=1000, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)
without any divergences.
If I want to extract the summary metrics for the distributions converged upon for each prior, I can use
az.summary(trace)
and generate an equation using the mean of the coefficient distributions, analogous to if I ran a multi-variate linear regression and generated coefficients for each predictor variable and plugged those into a simple equation such as: y = ax1 + bx2 + cx3 + d
In this case for a Media Mix model the equation would be more like:
y = logistic(ch1, mu1) * b1 + logistic(ch2, mu2) * b2 + logistic(ch3, mu3) * b3 + ctrl1 * beta_ctrl1 + ctrl2 * beta_ctrl2 + sigma
ch1, ch2 and ch3 represent my media channels and b1, b2 and b3 represent my channel betas which are HalfNormal distributions (ensuring a positive coefficient) while mu1, mu2, and mu3 are Gamma distributions representing the mu in my logistic equation. ctrl1 and ctrl2 are my continuous control variables and beta_ctrl1 and beta_ctrl2 are Normal distributions.
I’ve created my model, fitted the target, and generated the coefficients. Finally I run the posterior predictive check using this code:
with model:
ppc = pm.sample_posterior_predictive(trace, random_seed=SEED)
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));
and the resulting chart is as follows:
I don’t understand why the posterior predictive mean and posterior predictive target is so far off from the observed target. . .
I don’t have an extensive background in the nuances of Bayesian modeling so it’s hard for me to pin point what some of the causes of this could be and whether this is a very significant deviation of the model prediction from the actual values.
What I’d like to know is. .
Are there any best practices to check variables before running the model?
- Should I log transform the predictors or the target variable? (In my case I do log transform my target and my control variables)
- Are there any specific checks I should perform on the relationships between the predictors and the target and what should I look for?
- How many samples should I use when fitting my model, and what does the tune parameter do?
- How do I know if my coefficients converged on valid distributions?
After running my model, what are some checks that I should perform to measure the validity and accuracy of my model?
- Currently I am running the posterior predictive check (PPC) and get whacky results for the posterior predicted target. .
- What are some methods for my to generate a simple “predicted” or “scored” column which I can use to calculated simple error metrics such as RMSE or MAPE? (I believe I’ve explained what I think would be the method above but I’m not sure if this is correct)
- Am I extracting the coefficients correctly using az.summary(trace)?
If anyone has any experience with Media Mix Modeling and Pymc3, please reach out to me!!!
Also, please correct me if my understanding is at all incorrect or missing any key pieces of information above!