Using sample posterior predictive on new data

Alexander_Grunewald · April 29, 2025, 9:35pm

Hello! I am very new to PyMC and bayesian modeling in general. I am currently following some of the example code used in rethinking statistics in chapter 8 and I’m trying to replicate it in PyMC but I am running into dimensionality errors when I am trying to extend my model to a new data set. I saw a post that already addressed this topic Setting new data for predictions, conflicting size with dims - Questions / version agnostic - PyMC Discourse but I am still confused on how to fix this issue. Any clarification would be appreciated. Here is my model set up and the code used to run the posterior predictive:

continent_labels, continent = pd.factorize(df_standard.cont_africa)

coord = {
    "features": ["rugged_std"],
    "obs_id": np.arange(df_standard.shape[0]),
    "continent": continent.values
}
with pm.Model(coords=coord) as m8_3:
    rugged_std = pm.Data("rugged_std", df_standard.rugged_std.values, dims="obs_id")
    continent_indx = pm.Data("continent_indx", continent_labels, dims="obs_id")

    # priors
    alpha = pm.Normal("alpha", mu=1, sigma=0.1, dims="continent")
    beta = pm.Normal("beta", mu = 0, sigma=0.3, dims = "features")
    sigma = pm.Exponential("sigma", 1)

    # Determenistic
    mu = pm.Deterministic("mu", alpha[continent_indx] + (rugged_std - rugged_std.mean())* beta[0], dims="obs_id")

    # Liklelihood
    y = pm.Normal("y", mu=mu, sigma=sigma, observed=df_standard.log_gdp_std ,dims="obs_id")

with m8_3:
    idata3 = pm.sample_prior_predictive(draws=100)
    idata3 = pm.sample(idata_kwargs={"log_likelihood": True})
    idata3.extend(pm.sample_posterior_predictive(idata3))

I am getting the error at this code chunck:

rugged_seq = np.linspace(-0.1, 1.1, 30)
continent_pred = np.repeat(0, len(rugged_seq))

with m8_3:
    pm.set_data({
        "rugged_std": rugged_seq,
        "continent_indx": continent_pred
    }, coords = {"obs_id": np.arange(rugged_seq.shape[0])})

    mu_pred = pm.sample_posterior_predictive(idata3, var_names=["mu"])

With this error message:

ValueError: conflicting sizes for dimension 'obs_id': length 170 on the data but length 30 on coordinate 'obs_id'

Thanks for any help!

Dekermanjian · April 30, 2025, 10:46am

Hi @Alexander_Grunewald,

I believe the mismatch is on your target variable y. You can either pass in zeros of the new shape into pm.set_data() (These aren’t used in the computation of your posterior predictive)

pm.set_data({
        "rugged_std": rugged_seq,
        "continent_indx": continent_pred,
        "y": np.zeros_like(rugged_seq)
    }, coords = {"obs_id": np.arange(rugged_seq.shape[0])})

or you can pass the argument predictions=True into pm.sample_posterior_predictive():

mu_pred = pm.sample_posterior_predictive(idata3, var_names=["mu"], predictions=True)

jessegrabowski · April 30, 2025, 12:24pm

y isn’t a pm.Data in the provided code, so you won’t be able to call set_data on it. You can make a y_data = pm.Data('y_data', df_standard.log_gdp_std, dims=['obs_id']), then pass y_data to set_data.

But in general this is a “sharp edge” of PyMC. When doing out of sample prediction, you need to pass in dummy data to update the static shape of the targets, even though the predictions won’t be conditioned on the observed values.

Dekermanjian · April 30, 2025, 1:22pm

Ah yes, I totally missed that the target was not made into a pm.Data() object. Thank you for catching that @jessegrabowski!

I also want to add one thing that I have experienced in the past. That is if you have missing data in your target variable and you want automatic imputation then you won’t be able to turn your target into a pm.Data() object. In that case I believe that you need to pass in the target directly to your likelihood and specify a new model specifically for out of sample predictions. Here is a resource that I go back to time and time again for out of sample predictions.

Alexander_Grunewald · April 30, 2025, 1:22pm

Thank you @jessegrabowski and @Dekermanjian. setting the dummy data for the y worked!

Topic		Replies	Views
Shape issues with sample_posterior_predictive in PyMc5 Questions	3	749	July 1, 2023
Trouble understanding how to use sample_posterior_predictive to generate a prediction	1	1242	August 26, 2023
"shape mismatch" when new data is set as a predictor for sample_posterior_predictive v5 prediction	2	2823	November 20, 2022
Help with Out of Sample Predictions	12	693	August 24, 2023
Predictions with PYMC, Dont work on test data set modeling	3	50	January 18, 2025

Using sample posterior predictive on new data

Related topics