The paper I linked was for unconditional models, so I’ll switch here to the notation for linear regression with y conditional on x.
-
This calibration assessment only works on data generated so that you know the true parameters. In the case of linear regression you can use whatever
xyou want, then forMtrials independently generate your owntheta_i ~ p(theta)andy_i ~ p(y | x, theta). -
Then given the posterior for a trial,
p(theta | x_i, y_i), you need to calculate the value of the true parametertheta_iin that posterior’s CDF, i.e.U_i = F_i(theta_i). If you have the closed form CDF you can use numpy’s quantile method. Or if you have enough samples from the posterior you can calculate the empirical quantile as the rank order oftheta_iwithin the sorted samples. One way isU_i = np.sum(samples < true_params_values) / num_samples. -
Once you have the
U_{1:M}you can get a summary statistic using the Komogorov-Smirnov goodness-of-fit test, which essentially tests the coverage of every credible interval. You can qualitatively examine empirical CDF or quantile-quantile (QQ) plots as in Figure 1b in the linked paper. The largerMthe more smooth those curves will be; 20 may start to get you a feel for what the curve will look like, and maybe it will start to clean up around 100 and get extra clean around 500 or 1000 trials? Probably depends on the data and problem.