Understanding a runaway posterior predictive plot

I have 2 different models and would like to understand why one model produces poor results and the other does not.

The first model(aka simple model) produces reasonable results.

with Model() as model_alpha_beta_simple:
    rho = Normal(name="rho", mu=0, sigma=0.1)
    beta = beta_vals 
    alpha = alpha_vals
    nu = Normal(name="nu", mu=4.6, sigma=0.3)
    residual_ratio = StudentT(
        name = 'residual_ratio',
        mu = np.power(
            np.power(alpha,2)
            + np.power(beta * independent_vals, 2)
            + 2 * beta * rho * alpha * independent_vals,
            0.5
        ) / dependent_vals - 1,
        sigma = 0.06,
        nu = nu,
        observed=residual_ratio_vals
    )

    # Inference!
    # draw 3000 posterior samples using NUTS sampling
    idata_simple = sample(draws=1000,tune=1000)

The second model produces a really poor posterior plot. In this model, I add a random variable to describe sigma in my residuals.

with Model() as model_alpha_beta:
    rho = Normal(name="rho", mu=0, sigma=0.1)
    beta = beta_vals 
    alpha = alpha_vals
    sigma = Gamma(name="sigma", alpha=1., beta=10.)
    nu = Normal(name="nu", mu=4.6, sigma=0.3)
    residual_ratio = StudentT(
        name = 'residual_ratio',
        mu = np.power(
            np.power(alpha,2)
            + np.power(beta * independent_vals, 2)
            + 2 * beta * rho * alpha * independent_vals,
            0.5
        ) / dependent_vals - 1,
        sigma = sigma,
        nu = nu,
        observed=residual_ratio_vals
    )

    # Inference!
    # draw 3000 posterior samples using NUTS sampling
    idata = sample(draws=1000,tune=1000)

Please help me understand why this causes an issue or how I could answer this question on my own.

Welcome!

I would assume that your prior on sigma is causing your posterior to be way too wide. Without more domain knowledge, I might suggest running some prior predictive checks and tweaking your priors until those checks are plausible.

Thanks for the reply. I plotted the prior predictive for sigma and it was indeed wider than I thought. I switched to a half normal and tightened up the distribution but I still received the same problem. I looked more in depth at the posterior predictive data and found results that were magnitudes larger than anything observed yet the priors all look reasonable now. Is there a way to see what specific random variables lead to a specific result in the posterior distribution?

Also, another question about modeling that I have. I am passing in independent_vals which is constant data. Where is this data stored in the inference data object and then how is it used in each draw?

What nu does your model learn in both cases?

Your second plot predicts posteriors with a mean != 0, which hint at a nu < 1

I assume the rest of your data you feed into the model is the same.

To analyze the parameter combinations you can look into the the trace values like

idata_simple.posterior[‘residual_ratio’].values
idata_simple.posterior[‘nu’].values

Compare the data at the same indeces

1 Like

Thanks for the advice. It was fitting a nu < 1 which did cause crazy residuals. After holding the nu constant, I get much more reasonable values.

Sounds good!

Another option you have is to keep nu out of the values you do not want / which are not reasonable. E.g. by constraining it like this:

import pymc as pm

  with pm.Model():
  
      nu_dist = pm.Normal.dist(mu=1, sigma=0.2)
      truncated_nu = pm.Truncated("truncated_nu", nu_dist, lower=1+1e-6)
      student_dist = pm.StudentT.dist(nu=truncated_nu)
  
      trace = pm.sample(tune=200, draws=200)
  
  print(trace.posterior['truncated_nu'].mean(), trace.posterior['truncated_nu'].std())

That is very helpful. I was trying to do this by changing the distribution but sometimes it easier to just use my domain knowledge and mandate the random variable cannot get to certain values.

Also, I was having an issue understanding why the posterior was so wide but the resulting distribution was so tight. My issue was the observed values is a very similar process to mu so of course that distribution is going to be very tight. To fix this, I needed to change the observed values to 0 which represents the distribution that I am trying to fit.