Maybe you can try SMC, it might gives a better weighting with lots of chains.
advi tend to underestimate the variance, so if anything you should try with the default jitter+adapt_diag. Ideally you want to have large enough energy proposal to mix across different mode. It might be interesting to try adapt_diag_grad as initialization.