Trouble understanding how to use sample_posterior_predictive to generate a prediction

Joshua_Noble · August 25, 2023, 8:12pm

I’m quite new to PyMC and admitteldy much more comfortable in R and Stan than Py so I’m probably doing something rudimentary wrong and I’m not tracking all the variations between different versions of PyMC. I have a very simple model that I’ve been toying with to learn that I’ve copied from here. I had started with more complex modeling but realized that I was in way over my head.

Anyhow, here is my attempt to get some predictions:

# Predictions - generate new data
new_data = np.random.randn(3, 3)

#concat my existing features with new ones?
tmp = np.concatenate((features, new_data))

# should coords have 3 or 103? Neither seems to work confusingly.
new_coords = {
          # dim 1: len of df
          'obs_id': np.arange(0,3), #is this all coords from concat'd set or just for new?
          # dim 2: feature cats.
          'features': ['treatment', 'cov1', 'cov2']
}

with hierarchical_model:
    # set it over the feature data? Is this needed? doesn’t work anyways.
    hierarchical_model.set_data(name = "feature_data", values = tmp, coords=new_coords)
    # generate preds, seems to work
    predictions = pm.sample_posterior_predictive(hierarchical_trace, predictions=True)
    # this doesn’t exist any more, not sure how to replace it?
    preds = az.from_pymc3_predictions(predictions,
                              coords=new_coords,
                              idata_orig=lm_idata,
                              inplace=False)

# # get y-hat for new X data
preds.predictions['y'].median(dim=('chain', 'draw')).values

# # view y-hat as 94% HDI
az.plot_posterior(preds, group="predictions");

I’m guessing that there’s some trickery with how we set the co-ordinates that I can’t seem to puzzle out from the docs and sadly much of the code that I can find to reference no longer seems to work. Any advice much apprecated!

jessegrabowski · August 26, 2023, 9:16pm

The syntax from the example you linked is somewhat out of date. coords are now deeply integrated into PyMC, so you don’t have to do all this juggling with az.from_pymc. Here’s a simplified model:

coords = {'features': ['treatment', 'cov1', 'cov2']}
coords_mutable = {'obs_id': np.arange(len(target))}

with pm.Model(coords=coords, coords_mutable=coords_mutable) as sk_lm:
    feature_data = pm.MutableData('feature_data', features, dims=('obs_id', 'features'))

    alpha = pm.Normal('alpha', mu=0, sigma=1)
    betas = pm.Normal('betas', mu=[40, 90, 50], sigma=5, dims='features')
    sigma = pm.Exponential("sigma", lam=1)

    mu = pm.Deterministic("mu", alpha + feature_data @ betas, dims='obs_id')

    y = pm.Normal("y", mu=mu, sigma=sigma, observed=target, dims='obs_id')

    idata = pm.sample(init='jitter+adapt_diag')

Since you’re planning to do out-of-sample stuff, you want to set the index coordinate with coords_mutable. That lets you change it down the road. Down the road is here:

new_data = np.random.randn(3, 3)

with sk_lm:
    pm.set_data({"feature_data": new_data}, 
                coords={'obs_id':np.arange(3) + len(target)})
    idata = pm.sample_posterior_predictive(idata, predictions=True, extend_inferencedata=True)

Since I used coords_mutable, I can pass new coords to pm.set_data. This is a common gotcha with out-of-sample prediction – you have to make sure the shapes of the targets somehow get updated to match the new data, even though you’re not using targets. This is discussed a bit more in-depth here.

Topic		Replies	Views
Using sample posterior predictive on new data v5 modeling	4	46	April 30, 2025
Posterior Predictive Sampling -- Works for Hierarchical Model?	3	2154	June 12, 2019
Prediction using sample_ppc in Hierarchical model Questions from_github	6	4972	December 14, 2017
Predictions with PYMC, Dont work on test data set modeling	3	40	January 18, 2025
How to use the posterior predictive distribution for checking a model from PyMC version agnostic arviz , model-checking	10	4135	March 14, 2023

Trouble understanding how to use sample_posterior_predictive to generate a prediction

Related topics