Regression predicted values in pymc

import pymc as pm
import pandas as pd
import arviz as az
import matplotlib.pyplot as plt
import numpy as np

print(f"Runing on PyMC v{pm.__version__}")

df_penguins=pd.read_csv('penguins.csv').dropna(how = 'any').reset_index(drop=True)
flipper_length_obs = df_penguins.loc[df_penguins['species']=='Adelie',"flipper_length_mm"].values 

with pm.Model() as model_adelie_flipper_regression:
    data = pm.MutableData("data", flipper_length_obs)
    β_0 = pm.Normal("β_0", 0, 4000)
    β_1 = pm.Normal("β_1", 0, 4000)
    σ = pm.HalfNormal("σ", 2000) 
    μ = pm.Deterministic("μ", β_0 + β_1 * data)
    y = pm.Normal("y", mu=μ, sigma=σ, observed=mass_obs)

    idata_r = pm.sample(chains=2)

fig, ax = plt.subplots()

beta_m =z['mean'][1]

flipper_length = np.linspace(flipper_length_obs.min(), flipper_length_obs.max(), 100)

flipper_length_mean = alpha_m + beta_m * flipper_length
ax.plot(flipper_length, flipper_length_mean, c='C4',
         label=f'y = {alpha_m:.2f} + {beta_m:.2f} * x')

ax.scatter(flipper_length_obs, mass_obs,marker='.')
az.plot_hdi(flipper_length_obs, idata_r.posterior['μ'], hdi_prob=0.94, color='lightblue', ax=ax)

ax.set_xlabel('Flipper Length')

This last bit of code is not working for me:I am trying to obtain the posterior predictive mass values ‘y’ for a single flipper_length=190. It was originally written in pymc3 and I suspect I am porting it incorrectly to pymc v4.4.0. or not understanding it correctly.

 # Change the underlying value to the mean observed flipper length to 190 and get predicted mass 'y' distribution.
with model_adelie_flipper_regression:
    pm.set_data({"data": [190]})
    ppcs_at_190 = pm.sample_posterior_predictive(idata_r.posterior, var_names=["y", "μ"])

fig, ax = plt.subplots()
az.plot_dist(ppcs_at_190["y"], label="Posterior Predictive of \nIndividual Penguin Mass", ax=ax)
az.plot_dist(ppcs_at_190["μ"],label="Posterior Predictive of μ", color="C4", ax=ax) . 

It would be great if anybody could clue me in to also adding az.hdi() lines to the above graph for the predicted mass values ‘y’ for eaclh flipper length ‘x’. Thanking you, Declan

I think you can use arviz’s plot_posterior() function instead of plot_dist() and you get HDIs by default.

Hi Cluhmann,
Thank you for your reply. Unfortunately, the code when written in the newer pymc produces an error at the last 2 lines of code:az.plot_dist(ppcs_at_190[“y”], label=“Posterior Predictive of \nIndividual Penguin Mass”, ax=ax), irrespective of whether az.plot_dist() or az.plot_posterior() is used.

KeyError: ‘y’

This is not surprising as ‘y’ is not in the idata_r.posterior inference object… If I use idata_r.posterior_predictive[‘y’].values.flatten() this gives me all the predicted ‘y’ irrespective of the ‘x’ value(‘flipper length’). However the problem is to be able to make predictions at particular chosen 'x’s

What I am trying to get to work in the newer pymc using the inference idata_r object is:
From the idata_r object:
(a) to get the ppc[‘y’] distribution at a given ‘x’ = 190 flipper length ( conditional on ‘x’ ppc)

In other words how do you write the line of code?:

(b) to add the ppc[‘y’] hdi lines to the graph showing visually the variation in the predictive spread in the y values conditional on ‘x’.

In other words how do you write the line of code?:
az.plot_hdi('predicted hdi-values of y conditional on ‘x’)

Any help on this would be great:–my own attempts one of illustrated above fails.
Thanking you,Declan.

I think you can just do something like this?

az.plot_posterior(ppcs_at_190, var_names=["y", "μ"], ax=ax)

You should be able to add the labels afterwards.

Thank you for your reply. The problem is we do not have a correct line of code for: ppcs_at_190. Osvaldo Martin solves the prediction y at a given x for linear regression at :

BookCode_Edition1/chp_03.ipynb at main · BayesianModelingandComputationInPython/BookCode_Edition1 · GitHub](BookCode_Edition1/chp_03.ipynb at main · BayesianModelingandComputationInPython/BookCode_Edition1 · GitHub) or in his excellent book page 82.but it is written in PYMC3. This is the exact same coding used in my code above:

with model_adelie_flipper_regression:
    pm.set_data({"flipper_length": [190]})
    ppcs_at_190 = pm.sample_posterior_predictive(idata_r.posterior, var_names=["y", "μ"])

fig, ax = plt.subplots()
az.plot_posterior(ppcs_at_190["y"], label="Posterior Predictive of Individual Penguin Mass given flipper length of 190mm", ax=ax)

But this code will not work in PYMC. v4.4.0. how do you port this into new PYMC and achieve (a) and (b) above.

Being able to predict y for a given x is a standard required requirement and very often the purpose of preceding with linear regression; I believe the extraction method from the idata object of the ppcs/given x for PYMC. v4.4.0 is all that is needed. Also the correct way to parameterise the az.plot_hdi() for the predicted y values/given x would allow the predicted hdi to be added to the graph. So if anyone has an easy way to port the pymc3 code for the predected y/given x values or script the code directly I would be grateful. Declan.

Does this work?

with model_adelie_flipper_regression:
    pm.set_data({"data": [190]})
    ppcs_at_190 = pm.sample_posterior_predictive(
        idata_r.posterior, var_names=["y", "μ"]

fig, ax = plt.subplots()
    label="Posterior Predictive of \nIndividual Penguin Mass",
    label="Posterior Predictive of μ",

brialliant well done
wow! At this stage I almost dont believe it. I have been trying every possible variation to get the ppcs_at_190 , since I sent in the query but no success.
Many thanks for getting a novice on the right path. Declan.

1 Like