In PyMC 4.2, sample_posterior_predictive() raises an exception in a situation that works fine in PyMC 3.11.
Consider the following simple and rather silly model:
import pymc as pm
import numpy as np
# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]
# Size of dataset
size = 100
# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
# Simulate outcome variable
Y = alpha + beta[0] * X1 + beta[1] * X2 + np.random.randn(size) * sigma
with pm.Model() as m1:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0, sigma=10)
alpha2 = pm.Normal("alph2", mu=alpha, sigma=0.2)
beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
sigma = pm.HalfNormal("sigma", sigma=1)
# Expected value of outcome
y_predicted = pm.Deterministic('y_predicted', alpha2 + beta[0] * X1 + beta[1] * X2)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal("Y_obs", mu=y_predicted, sigma=sigma, observed=Y)
with m1:
trace = pm.sample(500, return_inferencedata=True)
Now suppose there is a second, somewhat different model, sampled posterior predictive using @lucianopaz’s model factory technique:
with pm.Model() as m_forward:
alpha2 = pm.Normal("alph2", mu=0, sigma=1) # dummy variable, values to be captured from m
beta = pm.Normal("beta", mu=0, sigma=10, shape=2)
sigma = pm.HalfNormal("sigma", sigma=1)
# Expected value of outcome
y_predicted = pm.Deterministic('y_predicted', alpha2 + beta[0] * X1 + beta[1] * X2)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal("Y_obs", mu=y_predicted, sigma=sigma, observed=Y)
with m_forward:
ppm = pm.sample_posterior_predictive(
trace=trace,
var_names=["sigma", "beta", "y_predicted", "Y_obs"]
)
This works as expected in PyMC 3.11. But in PyMC 4.2.1, sample_posterior_predictive() throws a KeyError, complaining about alpha being in the trace but not in the sampled model.
alpha is not in fact present in the second model, by design.
Is this a bug with PyMC 4, a behavior of the prior version that was not enabled in the new? Or maybe model factories are not intended to work in the new version, and I need to find a different technique? Or maybe there is some simple way for me to make model factories work, e.g. removing alpha from the trace before passing it to sample_prior_predictive()?