There are a few points to keep in mind before you compare the fitting accuracy between MCMC and ADVI:
-
Convergence. Are you sure ADVI is already converged? did you check the ELBO history? It is often observed that VI is very sensitive to training time (eg see https://arxiv.org/abs/1802.02538).
-
Optimizer. Related to the first point, sometimes the default SGD does not perform well. Try ADAM.
-
Stochastic nature of VI in PyMC3. PyMC3 use MC sample to approximate the objective gradients. As a consequence, the result of the fit is stochastic - you can see that in the ELBO it is not always decreasing. So when you stop the training, VI return the fitting from the last iteration, which can happen to have high ELBO. Solution is to increase the
obj_n_mc- Number of monte carlo samples used for approximation of objective gradients. Another solution is get the approximation that minimized the ELBO in the last few hundred iterations, but I don’t think there is code to do it out of the box yet.
Currently these are the points that comes to mind. You can have a look of my WIP notebook with similar idea to compare VI and MCMC.