I don’t have a code sample, since the problem I’m running is fairly long, but I would expect a similar problem would likely arise in the larger Bayesian Neural Network examples.
If I run a fairly large model, (i.e, a few thousand parameters), change the random seed, and re-run the problem, I’ll get a fairly similar ELBO trace, but drastically different mean and variance estimates.
Is there any literature out there on the repeatability of ADVI for large models? or possible diagnostics that might indicate a poor sampling of a potentially multimodal posterior distribution?