Rejecting/thinning modes based on average density

Maybe you can try SMC, it might gives a better weighting with lots of chains.

advi tend to underestimate the variance, so if anything you should try with the default jitter+adapt_diag. Ideally you want to have large enough energy proposal to mix across different mode. It might be interesting to try adapt_diag_grad as initialization.