Trouble understanding how to use sample_posterior_predictive to generate a prediction

The syntax from the example you linked is somewhat out of date. coords are now deeply integrated into PyMC, so you don’t have to do all this juggling with az.from_pymc. Here’s a simplified model:

coords = {'features': ['treatment', 'cov1', 'cov2']}
coords_mutable = {'obs_id': np.arange(len(target))}

with pm.Model(coords=coords, coords_mutable=coords_mutable) as sk_lm:
    feature_data = pm.MutableData('feature_data', features, dims=('obs_id', 'features'))

    alpha = pm.Normal('alpha', mu=0, sigma=1)
    betas = pm.Normal('betas', mu=[40, 90, 50], sigma=5, dims='features')
    sigma = pm.Exponential("sigma", lam=1)

    mu = pm.Deterministic("mu", alpha + feature_data @ betas, dims='obs_id')

    y = pm.Normal("y", mu=mu, sigma=sigma, observed=target, dims='obs_id')

    idata = pm.sample(init='jitter+adapt_diag')

Since you’re planning to do out-of-sample stuff, you want to set the index coordinate with coords_mutable. That lets you change it down the road. Down the road is here:

new_data = np.random.randn(3, 3)

with sk_lm:
    pm.set_data({"feature_data": new_data}, 
                coords={'obs_id':np.arange(3) + len(target)})
    idata = pm.sample_posterior_predictive(idata, predictions=True, extend_inferencedata=True)

Since I used coords_mutable, I can pass new coords to pm.set_data. This is a common gotcha with out-of-sample prediction – you have to make sure the shapes of the targets somehow get updated to match the new data, even though you’re not using targets. This is discussed a bit more in-depth here.

1 Like