What are the differences between NUTS and ADVI?

ckrapu · April 24, 2020, 7:02pm

Adding my two cents having used the PyMC3 implementations of both on a range of nonstandard problems:

When it is said that ADVI is “less accurate”, it is often made as a general comment about the inability of all variational inference methods to properly characterize posterior variance. For example, the ADVI posterior for a regression coefficient will often be too narrow / or concentrated. This is because of the fact that VI methods are minimizing a loss function which asymmetrically favors using too-narrow approximations.
It has often occurred to me when working with weird models with very difficult posterior geometries such that all MCMC methods fail, initializing first with ADVI can help get the samplers to work. I suspect that this is because the initial values for the MCMC sampler may be poorly chosen, leading to horrible numerical issues for Hamiltonian Monte Carlo; it appears that in many cases ADVI is somewhat more robust.
Some problems are just too big for NUTS (even with a GPU) and ADVI is the only option for model fitting. I’ve used ADVI + GPU to train deep convolutional autoencoders with 10 million+ parameters and Bayesian regularization using the minibatched implementation of ADVI within PyMC3.

As a short summary, I think the worst use case for ADVI would be with a small dataset and complicated model structure while it is best used for very large models and datasets requiring minibatched computation.

Topic		Replies	Views
ADVI result systematically different to NUTS Questions	2	615	January 29, 2020
Using ADVI's question Questions	5	536	December 3, 2018
Frequently Asked Questions Questions	12	24761	June 30, 2023
How to initialize NUTS sampler with advi using non-standard obj_optimizer? v5 variational_inferenc , sampling	3	607	April 14, 2023
Intro Bayesian Regression using HMC & ADVI Sharing	4	1552	January 18, 2019

What are the differences between NUTS and ADVI?

Related topics