How to get posterior predictive distribution sample data for a single prediction?

ArnoSixer · November 5, 2024, 3:39pm

Hi, I’m new to using PYMC and I am struggling to do simple stuff like getting the output posterior predictive distribution for a specific yi given specific input feature.

I am basically trying to reproduce figure 3.11 in the associated online PYMC book Code 3: Linear Models and Probabilistic Programming Languages — Bayesian Modeling and Computation in Python

But it seems that updates to the library makes this code not working anymore.

I’ve created a simple model to predict weight given the height:

with pm.Model() as weight_model:

    height = pm.Data("Height", bmi['Height'], mutable=True)
    weight=pm.Data("Weight", bmi['Weight'], mutable=True)
    
    # Define priors
    alpha = pm.Normal("alpha", mu=0, sigma=100)  # Prior for intercept
    beta = pm.Normal("beta", mu=0, sigma=100)    # Prior for slope
    sigma = pm.HalfNormal("sigma", sigma=10)    # Prior for noise

    # Define the likelihood (using Height as the predictor)
    mu = pm.Deterministic("mu", alpha + beta * height)           # Linear relationship
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=weight)  # Observed data

    # Sample from the posterior
    trace = pm.sample(1000,tune=2000,return_inferencedata=True)  # Sampling the posterior

And then I just want to use the mean value of height as a single input data point to output and plot the predictive distribution of weight for that particular data point

with weight_model:
    # Change the underlying value to the mean observed flipper length
    # for our posterior predictive samples
    pm.set_data({"Height": [bmi['Height'].mean()]})
    posterior_predictions = pm.sample_posterior_predictive(
        trace.posterior, var_names=["y_obs", "mu"])

fig, ax = plt.subplots()
az.plot_dist(posterior_predictions.posterior_predictive["y_obs"], label="Posterior Predictive of \nIndividual weight", ax=ax)
az.plot_dist(posterior_predictions.posterior_predictive["mu"],label="Posterior Predictive of μ", color="C4", ax=ax)
ax.set_xlim(2900, 4500);
ax.legend(loc=2)
ax.set_xlabel("weight")
ax.set_yticks([])
plt.show()

The problem is that posterior_predictions.posterior_predictive[“y_obs”] returns a 3 dimensional array as such : xarray.DataArray

‘y_obs’

chain: 4
draw: 1000
y_obs_dim_2: 741

It seems that there is a 3rd dimension equal to my original dataset size which seems that the array contains 741 different predictions, each of them having 4000 samples from the posterior, instead of only yielding the single prediction from the mean of height. What am I doing wrong??

Thanks for the help.

jessegrabowski · November 5, 2024, 4:13pm

You should use pm.set_data in this case. There is an applied example here that seems quite similar to the code you’ve shared.

Topic		Replies	Views
Predictions with PYMC, Dont work on test data set modeling	3	55	January 18, 2025
Using sample posterior predictive on new data v5 modeling	4	82	April 30, 2025
How do I predict on new, unseen real data using pm.sample_posterior_predictive? Questions	13	8394	January 7, 2021
Could somebody provide a minimal example for sample_posterior_predictive() Questions	2	418	April 15, 2021
Trouble understanding how to use sample_posterior_predictive to generate a prediction	1	1276	August 26, 2023

How to get posterior predictive distribution sample data for a single prediction?

Related topics