How to get the labels for the predictive distribution?

I have a model in bambi, and I can get the predictive distribution for new data in this model into a dataframe by running model.predict(…).to_dataframe(). But when I do, I get a long list of responses for each draw and each observation. Is there a way to list the observation, draw, and chain that generated each prediction?

I am not super familiar with bambi, but I think the call to predict() returns a standard arviz.InferenceData object and you are probably only interested in the posterior_predictive group in that object. So instead of converting the entire object to a dataframe, you probably just want that group. Given that you are converting the return value of predict(), I assumed you have set the inplace argument to False:

ppc = model.predict(idata, kind='pps', inplace=False)['posterior_predictive']
# convert to pandas dataframe if you like
print(ppc.to_dataframe())

Given the defaults, I think the idea is to use the inferenceData object as cumulative storage:

idata = model.fit()
model.predict(idata, kind='pps')
print(idata['posterior_predictive'].to_dataframe())
1 Like

@cluhmann points in the right direction.

Model.predict() modifies or creates an arviz.InferenceData object. When you use kind="mean" it adds a new variable to the .posterior group (the name of the new variable is the name of the response with _mean appended). If you use kind="pps" it obtains posterior predictive samples, and it is added to the .posterior_predictive group.

1 Like