Hey everyone,
I’ve been reading a bit about predictively oriented (PrO) posteriors and trying to understand how they behave beyond the high-level motivation.
From what I understand, the key idea is that instead of concentrating on a single parameter value, the posterior is defined through the induced predictive distribution. As a result, unlike standard Bayesian posteriors, PrO posteriors only collapse to a point mass when the model is exactly well specified; otherwise they stabilise to a non-degenerate distribution, where the remaining spread reflects model misspecification rather than lack of data.
What I’m still unclear about is how this limiting object looks mathematically in practice. In particular, I’m not sure under what conditions the predictively optimal posterior is unique, or how its variance relates quantitatively to the degree of misspecification. It also isn’t obvious to me how sensitive this behaviour is to the choice of predictive divergence being optimised.
On the computational side, I’ve seen proposals to sample PrO posteriors using mean-field Langevin dynamics, but I’m trying to understand how closely the resulting particle system actually tracks the intended predictive objective, especially in higher-dimensional or misspecified settings.
I’d really appreciate pointers to references, toy examples, or existing implementations that helped others build intuition around these questions. Thanks!