How to get posterior predictive distribution sample data for a single prediction?

Hi, I’m new to using PYMC and I am struggling to do simple stuff like getting the output posterior predictive distribution for a specific yi given specific input feature.

I am basically trying to reproduce figure 3.11 in the associated online PYMC book Code 3: Linear Models and Probabilistic Programming Languages — Bayesian Modeling and Computation in Python

But it seems that updates to the library makes this code not working anymore.

I’ve created a simple model to predict weight given the height:

with pm.Model() as weight_model:

    height = pm.Data("Height", bmi['Height'], mutable=True)
    weight=pm.Data("Weight", bmi['Weight'], mutable=True)
    
    # Define priors
    alpha = pm.Normal("alpha", mu=0, sigma=100)  # Prior for intercept
    beta = pm.Normal("beta", mu=0, sigma=100)    # Prior for slope
    sigma = pm.HalfNormal("sigma", sigma=10)    # Prior for noise

    # Define the likelihood (using Height as the predictor)
    mu = pm.Deterministic("mu", alpha + beta * height)           # Linear relationship
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=weight)  # Observed data

    # Sample from the posterior
    trace = pm.sample(1000,tune=2000,return_inferencedata=True)  # Sampling the posterior

And then I just want to use the mean value of height as a single input data point to output and plot the predictive distribution of weight for that particular data point

with weight_model:
    # Change the underlying value to the mean observed flipper length
    # for our posterior predictive samples
    pm.set_data({"Height": [bmi['Height'].mean()]})
    posterior_predictions = pm.sample_posterior_predictive(
        trace.posterior, var_names=["y_obs", "mu"])

fig, ax = plt.subplots()
az.plot_dist(posterior_predictions.posterior_predictive["y_obs"], label="Posterior Predictive of \nIndividual weight", ax=ax)
az.plot_dist(posterior_predictions.posterior_predictive["mu"],label="Posterior Predictive of μ", color="C4", ax=ax)
ax.set_xlim(2900, 4500);
ax.legend(loc=2)
ax.set_xlabel("weight")
ax.set_yticks([])
plt.show()

The problem is that posterior_predictions.posterior_predictive[“y_obs”] returns a 3 dimensional array as such : xarray.DataArray

‘y_obs’

  • chain: 4
  • draw: 1000
  • y_obs_dim_2: 741

It seems that there is a 3rd dimension equal to my original dataset size which seems that the array contains 741 different predictions, each of them having 4000 samples from the posterior, instead of only yielding the single prediction from the mean of height. What am I doing wrong??

Thanks for the help.

You should use pm.set_data in this case. There is an applied example here that seems quite similar to the code you’ve shared.