Hi, I’m new to using PYMC and I am struggling to do simple stuff like getting the output posterior predictive distribution for a specific yi given specific input feature.
I am basically trying to reproduce figure 3.11 in the associated online PYMC book Code 3: Linear Models and Probabilistic Programming Languages — Bayesian Modeling and Computation in Python
But it seems that updates to the library makes this code not working anymore.
I’ve created a simple model to predict weight given the height:
with pm.Model() as weight_model:
height = pm.Data("Height", bmi['Height'], mutable=True)
weight=pm.Data("Weight", bmi['Weight'], mutable=True)
# Define priors
alpha = pm.Normal("alpha", mu=0, sigma=100) # Prior for intercept
beta = pm.Normal("beta", mu=0, sigma=100) # Prior for slope
sigma = pm.HalfNormal("sigma", sigma=10) # Prior for noise
# Define the likelihood (using Height as the predictor)
mu = pm.Deterministic("mu", alpha + beta * height) # Linear relationship
y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=weight) # Observed data
# Sample from the posterior
trace = pm.sample(1000,tune=2000,return_inferencedata=True) # Sampling the posterior
And then I just want to use the mean value of height as a single input data point to output and plot the predictive distribution of weight for that particular data point
with weight_model:
# Change the underlying value to the mean observed flipper length
# for our posterior predictive samples
pm.set_data({"Height": [bmi['Height'].mean()]})
posterior_predictions = pm.sample_posterior_predictive(
trace.posterior, var_names=["y_obs", "mu"])
fig, ax = plt.subplots()
az.plot_dist(posterior_predictions.posterior_predictive["y_obs"], label="Posterior Predictive of \nIndividual weight", ax=ax)
az.plot_dist(posterior_predictions.posterior_predictive["mu"],label="Posterior Predictive of μ", color="C4", ax=ax)
ax.set_xlim(2900, 4500);
ax.legend(loc=2)
ax.set_xlabel("weight")
ax.set_yticks([])
plt.show()
The problem is that posterior_predictions.posterior_predictive[“y_obs”] returns a 3 dimensional array as such : xarray.DataArray
‘y_obs’
- chain: 4
- draw: 1000
- y_obs_dim_2: 741
It seems that there is a 3rd dimension equal to my original dataset size which seems that the array contains 741 different predictions, each of them having 4000 samples from the posterior, instead of only yielding the single prediction from the mean of height. What am I doing wrong??
Thanks for the help.