Chains converge to local optima?

I think target accept and multi-modality are largely (not entirely) separate issues. Small step sizes just explore difficult posteriors better in general. They adhere to the surface of the posterior better so chains can crawl through low-probability regions and make it over to a new mode.

I wouldn’t get too hung up on that though. It’s a losing battle to fix multi-modality by fiddling with the sampler’s parameters. The best option is to think mechanically about what the two modes represent in terms of your target system. Do both modes make sense given domain knowledge? If not, you might try devising priors or changes to the model’s structure that force the posterior away from the implausible mode. One thing that sometimes helps in case of multi-modality is an order constraint.

The second best option is to switch inference algorithms. I don’t have a ton of expertise here but smc is often mentioned. A surprising amount of the blackjax sampling book is about algorithms that work for multi-modal posteriors. The Sampling Book Project — The Sampling Book Project

p.s. yeah I’d agree two of your chains are better and it’s a fairly wide gap.

1 Like