The problem is that the ADVI algorithm doesn’t work very well as originally written. See:
- Welandawe et al. A Framework for Improving the Reliability of BBVI
- Domke et al. Advances in BBV.
The problem with ADVI is that it could really use (a) many more evaluations of log density and gradient for evaluation of ELBO (thousands are ideal in order to stabilize stochastic gradient), (b) the stick-the-landing reparameterization gradients unless you’re going to do even more evaluations of log density and gradient for the ELBO, and (c) better step size adaptation.
There’s also the not insubstantial problem of only having a multivariate normal approximation on the unconstrained scale or even a completely factored multivariate normal with diagonal covariance—this limits the kinds of distributions that can be modeled well (e.g., all positive-constrained parameters get a lognormal marginal posterior and all unconstrained parameters get a normal marginal posterior in both dense and diagonal ADVI versions).