How do I "Score" My Model

This might be a simple question but I’m having difficulty coming out with a single prediction from my model. Generally a model will have the target variable and a set of predictor variables, and generates the predicted target. In the case of say a linear regression, the algorithm will essentially be optimizing the regression equation coefficients to minimize the error between the predicted value and target. Whereas, in a Bayesian statistical model the coefficients are not single points but rather distributions and thus the predicted value is also a distribution.

Ultimately, I understand that within the distributions converged upon for all of the coefficients and predicted values, there is a mean and standard deviation. It would be best for the mean of the predicted distribution to be close to the target since that would indicate that the model was able to more or less predict the target, and it would be even better for the standard deviation to be as small as possible since that would indicate a narrower distribution, potentially increasing the accuracy. Having a huge standard deviation on the predicted distributions wouldn’t exactly be ideal but it’s more important to have the mean of the predicted distribution be close to the target than having a small standard deviation.

I’m creating a fairly complex Media Mix Model combing Halfnormal, Normal and Gamma distributions for the priors and I’m able to fit my model using:

pm.sample(10000, tune=1000, init="adapt_diag", random_seed=SEED, return_inferencedata=True, cores=1)

without any divergences.

If I want to extract the summary metrics for the distributions converged upon for each prior, I can use


and generate an equation using the mean of the coefficient distributions, analogous to if I ran a multi-variate linear regression and generated coefficients for each predictor variable and plugged those into a simple equation such as: y = ax1 + bx2 + cx3 + d

In this case for a Media Mix model the equation would be more like:

y = logistic(ch1, mu1) * b1 + logistic(ch2, mu2) * b2 + logistic(ch3, mu3) * b3 + ctrl1 * beta_ctrl1 + ctrl2 * beta_ctrl2 + sigma

ch1, ch2 and ch3 represent my media channels and b1, b2 and b3 represent my channel betas which are HalfNormal distributions (ensuring a positive coefficient) while mu1, mu2, and mu3 are Gamma distributions representing the mu in my logistic equation. ctrl1 and ctrl2 are my continuous control variables and beta_ctrl1 and beta_ctrl2 are Normal distributions.

I’ve created my model, fitted the target, and generated the coefficients. Finally I run the posterior predictive check using this code:

with model:
    ppc = pm.sample_posterior_predictive(trace, random_seed=SEED)
az.plot_ppc(az.from_pymc3(posterior_predictive=ppc, model=model));

and the resulting chart is as follows:


I don’t understand why the posterior predictive mean and posterior predictive target is so far off from the observed target. . .

I don’t have an extensive background in the nuances of Bayesian modeling so it’s hard for me to pin point what some of the causes of this could be and whether this is a very significant deviation of the model prediction from the actual values.

What I’d like to know is. .

Are there any best practices to check variables before running the model?

  • Should I log transform the predictors or the target variable? (In my case I do log transform my target and my control variables)
  • Are there any specific checks I should perform on the relationships between the predictors and the target and what should I look for?
  • How many samples should I use when fitting my model, and what does the tune parameter do?
  • How do I know if my coefficients converged on valid distributions?

After running my model, what are some checks that I should perform to measure the validity and accuracy of my model?

  • Currently I am running the posterior predictive check (PPC) and get whacky results for the posterior predicted target. .
  • What are some methods for my to generate a simple “predicted” or “scored” column which I can use to calculated simple error metrics such as RMSE or MAPE? (I believe I’ve explained what I think would be the method above but I’m not sure if this is correct)
  • Am I extracting the coefficients correctly using az.summary(trace)?

If anyone has any experience with Media Mix Modeling and Pymc3, please reach out to me!!!

Also, please correct me if my understanding is at all incorrect or missing any key pieces of information above!

Others here are far more qualified than I, but two things:

  1. the much broader distribution of the posterior predictive could just reflect very broad (uninformative) priors. You seem to know what kind of results you are expecting, so encapsulate this in your priors. Which leads me to…
  2. Have you run a prior predictive check to see that the priors broadly cover the domain of the targets?


Your ppc plot shows clear signs of overdispersion (link to a blogpost of my own). It could be due to the choice of priors, for example the prior on the scale parameter having too much of its weight on too large values or due to model miss specification.

You should not look at the number of samples but at the number of effective samples (ess). In most cases, an effective sample size of ~400 is enough, depending on the accuracy you want on you estimates (or if you want accurate estimates of tail properties) you may need more. The MCSE will give you an idea about where you are at. Both these quantities can be used to assess convergence of the MCMC. Take a look at this PyMCon talk to get both an overview on model convergence and more details on the strange words and acronyms I have just written:

The tune parameter regulates the number of warmup/tuning iterations during which the sampler parameters are tuned in order to be able to sample from the posterior efficiently and accurately. This blog post covers that in more detail with diagrams and intuitive explanations in case you want to learn more about tuning.

Here I am a bit lost about both what you are doing and about what you want to do. If you are using sample_posterior_predictive, is should internally take care of any equation you have like the y = logistic... because you have had to use it when creating and sampling the model. It is hard to see what is happening only with parts of the code.

I believe the reason behind part of the confusion could be behind these sentences, but this is a guess:

You may have set their priors to these distributions, but this does not mean that their posterior distributions are the same type of distribution with different parameters, a parameter whose prior distribution is an exponential distribution can perfectly have a normal distribution as posterior distribution. You can use plots like az.plot_posterior or az.plot_density to explore the posterior distributions visually.

1 Like