On the repeatability of variational inference

I don’t have a code sample, since the problem I’m running is fairly long, but I would expect a similar problem would likely arise in the larger Bayesian Neural Network examples.
If I run a fairly large model, (i.e, a few thousand parameters), change the random seed, and re-run the problem, I’ll get a fairly similar ELBO trace, but drastically different mean and variance estimates.

Is there any literature out there on the repeatability of ADVI for large models? or possible diagnostics that might indicate a poor sampling of a potentially multimodal posterior distribution?

Hi, this is a pretty tricky problem in general and I don’t think there’s a consensus on post-hoc ADVI diagnostics, but you may find the following paper somewhat helpful. Yuling Yao gave a presentation of his work on PSIS diagnostics at StanCon helsinki and it seems quite promising, but your mileage may vary.

[Yes, but Did It Work?: Evaluating Variational Inference] https://arxiv.org/abs/1802.02538
Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman