Calibration Plot using Posterior Predictive Samples in Bambi?

hxk1633 · April 26, 2023, 6:22pm

I’m creating a calibration curve plot to evaluate the performance of a Bayesian logistic regression model. This is what I have been able to generate using the ‘y_mean’ data variable in the ‘posterior’ group in the InferenceData object returned from the Bambi model fitting process:

The code I used to generate the plot:

y_actual = test_data['vote_outcome'].replace({'democrat': 1, 'republican': 0})
y_pred = logit_idata.posterior['vote_outcome_mean'].mean(("chain", "draw"))

df = pd.DataFrame({'prob': y_pred, 'actual': y_actual})
df['prob_bin'] = pd.cut(df.prob, bins=np.arange(0, 1.1, 0.1))

prob_means = df.groupby('prob_bin').prob.mean().reset_index()['prob'].dropna().to_numpy()
tp_rate = df.groupby('prob_bin').actual.mean().reset_index()['actual'].dropna().to_numpy()

traces = []

line_trace = go.Scatter(
    x=np.arange(0,1.1, 0.1),
    y=np.arange(0,1.1, 0.1),
    mode='lines',
    line=dict(color=theme, width=2, dash='dash')
)

traces.append(line_trace)

calibration_trace = go.Scatter(
    x=prob_means,
    y=tp_rate,
    mode='markers+lines',
    line=dict(color=line_color, width=1.5),
    showlegend=True)
traces.append(calibration_trace)

layout = go.Layout(
title='Calibration Plot',
xaxis=dict(title='Forecasted Probability of County Voting Democrat'),
yaxis=dict(title='Actual Proportion of Counties Voting Democratic'),
showlegend=False
)

fig = go.Figure(data=traces, layout=layout)

Instead of using the ‘posterior’ group, I want to get the 94% credible intervals from ‘posterior_predictive’ group and plot it on a calibration curve for each bin. But Bambi returns 1 or 0 for my binary classification problem for posterior predictive samples. I’m trying to create something similar to the plot from FiveThirtyEight:

I’m not sure how to get started on reproducing a similar calibration plot for Bambi logistic regression model. Any guidance or advice is appreciated. Thanks!

tcapretto · May 4, 2023, 5:13pm

I’m not sure on which quantity the second plot is based on. It’s surprising though that the credible lines are along the y axis, which is the axis of the actual win percentage (I would expect it to be horizontal, along the x axis, the axis of the forecasted win percentage).

You could get the values by computing quantiles of y_mean

 logit_idata.posterior['vote_outcome_mean'].quantile((0.025, 0.975), ("chain", "draw"))

I don’t recall if that’s exactly the syntax you need, but it would be along those lines.

Topic		Replies	Views
Lm_plot with bambi version agnostic bambi , arviz	5	734	April 15, 2023
Plot predictions by category in a model with multiple categories version agnostic bambi , modeling	1	49	November 27, 2024
Bambi logistic regression, prior and posterior distributions of probabilities Questions bambi	4	1461	October 7, 2021
Posterior Predictive distributions: beta-binomial models Questions	1	653	March 10, 2021
What does it mean when posterior predictive mean is outside of the credibility values? version agnostic modeling	2	392	July 20, 2022

Calibration Plot using Posterior Predictive Samples in Bambi?

Related topics