Sample_posterior_predictive() works fine in PyMC 3, raises exception in v4

In PyMC 4.2, sample_posterior_predictive() raises an exception in a situation that works fine in PyMC 3.11.

Consider the following simple and rather silly model:

import pymc as pm
import numpy as np 

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0] * X1 + beta[1] * X2 + np.random.randn(size) * sigma
with pm.Model() as m1:
    # Priors for unknown model parameters
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    alpha2 = pm.Normal("alph2", mu=alpha, sigma=0.2)
    beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal("sigma", sigma=1)
    
    # Expected value of outcome
    y_predicted = pm.Deterministic('y_predicted', alpha2 + beta[0] * X1 + beta[1] * X2)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=y_predicted, sigma=sigma, observed=Y)

with m1:
    trace = pm.sample(500, return_inferencedata=True)

Now suppose there is a second, somewhat different model, sampled posterior predictive using @lucianopaz’s model factory technique:

with pm.Model() as m_forward:
    alpha2 = pm.Normal("alph2", mu=0, sigma=1)   # dummy variable, values to be captured from m
    beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal("sigma", sigma=1)
    
    # Expected value of outcome
    y_predicted = pm.Deterministic('y_predicted', alpha2 + beta[0] * X1 + beta[1] * X2)

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal("Y_obs", mu=y_predicted, sigma=sigma, observed=Y)

with m_forward:
    ppm = pm.sample_posterior_predictive(
        trace=trace,
        var_names=["sigma", "beta", "y_predicted", "Y_obs"]
    )

This works as expected in PyMC 3.11. But in PyMC 4.2.1, sample_posterior_predictive() throws a KeyError, complaining about alpha being in the trace but not in the sampled model.

alpha is not in fact present in the second model, by design.

Is this a bug with PyMC 4, a behavior of the prior version that was not enabled in the new? Or maybe model factories are not intended to work in the new version, and I need to find a different technique? Or maybe there is some simple way for me to make model factories work, e.g. removing alpha from the trace before passing it to sample_prior_predictive()?

3 Likes

I’m not sure what the expected behavior is here. You have a trace from a model with one set of parameters and are trying to use it to generate predictive samples from a second model with a different set of parameters. The error you are seeing indicates that the trace and the model are not consistent. I suppose it could work such that predictive sampling works as long as the necessary parameters are present in the trace, but that seems like a recipe for bugs.

Note that the model factory example you pointed to changes the shape of various parameters, but every model that comes out of that factory (as far as I can see) has the same set of parameters (or at least the same number of named parameters).

1 Like

CC @lucianopaz should we be okay with passing a trace to a model with fewer variables?

1 Like

Seems risky to me. At the very least it seems like a semi-scary warning should be raised.

1 Like

Not having some variables in a model or having extra variables in a model should work fine. For example, when you create a conditional gp, the model has an extra random variable that usually isn’t in the trace, but its values are conditioned by the rest of the model parameters.
This is a dumb bug in get_vars_in_point_list. I’ll fix it this week

3 Likes