What are the differences between NUTS and ADVI?

jroberayalas · April 18, 2020, 8:22am

Hi! I’m new in the Bayesian / PyMC3 world, and I’ve come across a lot the NUTS and ADVI methods for doing inference. My understanding is that you should avoid using the find_MAP function, and instead use one of these two. Can anyone provide any overall summary of the difference between them? Is it fair to say that ADVI is faster (but less accurate) than NUTS? Any help (and useful references) is highly appreciated!

ckrapu · April 24, 2020, 7:02pm

Adding my two cents having used the PyMC3 implementations of both on a range of nonstandard problems:

When it is said that ADVI is “less accurate”, it is often made as a general comment about the inability of all variational inference methods to properly characterize posterior variance. For example, the ADVI posterior for a regression coefficient will often be too narrow / or concentrated. This is because of the fact that VI methods are minimizing a loss function which asymmetrically favors using too-narrow approximations.
It has often occurred to me when working with weird models with very difficult posterior geometries such that all MCMC methods fail, initializing first with ADVI can help get the samplers to work. I suspect that this is because the initial values for the MCMC sampler may be poorly chosen, leading to horrible numerical issues for Hamiltonian Monte Carlo; it appears that in many cases ADVI is somewhat more robust.
Some problems are just too big for NUTS (even with a GPU) and ADVI is the only option for model fitting. I’ve used ADVI + GPU to train deep convolutional autoencoders with 10 million+ parameters and Bayesian regularization using the minibatched implementation of ADVI within PyMC3.

As a short summary, I think the worst use case for ADVI would be with a small dataset and complicated model structure while it is best used for very large models and datasets requiring minibatched computation.

jroberayalas · March 13, 2023, 7:25pm

Hi @ckrapu ,

Is there a way to do this ADVI initialization in PyMC?

Can you elaborate more on this?

gregseljak · July 4, 2023, 11:46am

Hey @jroberayalas ,

Initializing with ADVI is really easy. By default, pm.sample calls a NUTSampler
trace = pm.sample(10000, init="advi")
pymc.init_nuts — PyMC 5.5.0 documentation
ADVI works by calculating the gradient of the ELBO at each point on its path. The ELBO is ideally computed across all samples, but in the case of batching, only a subsample is used to approximate the ELBO. This makes it able to handle more datapoints with less precision, which the creators think to be a worthwhile tradeoff (see section 4.3)
https://arxiv.org/pdf/1603.00788.pdf

Topic		Replies	Views
ADVI result systematically different to NUTS Questions	2	617	January 29, 2020
Using ADVI's question Questions	5	536	December 3, 2018
Frequently Asked Questions Questions	12	24854	June 30, 2023
How to initialize NUTS sampler with advi using non-standard obj_optimizer? v5 variational_inferenc , sampling	3	615	April 14, 2023
Intro Bayesian Regression using HMC & ADVI Sharing	4	1554	January 18, 2019

What are the differences between NUTS and ADVI?

Related topics