Justification for ADVI convergence criterion?

Martin_Ingram · November 4, 2021, 1:40am

Hi everyone,

I’m currently working on a project involving variational inference, and as part of that, we’d like to compare a method against PyMC’s ADVI. Part of the challenge with ADVI is detecting convergence, and in PyMC, this is handled with a callback, as described in the Quickstart (Variational API quickstart — PyMC3 3.11.4 documentation), and the source code is here: pymc/callbacks.py at main · pymc-devs/pymc · GitHub . It appears that the way convergence is detected in PyMC is to compute the relative difference in the parameter vectors at fixed intervals. By default, the infinity norm is used, which is equivalent to finding the maximum difference. The default tolerance seems to be 10^{-3}.

What I would like to know is whether there is any justification for this convergence criterion. Is there any paper that argues for it? It seems that it is different from the one used in Stan, which instead computes the relative change in the ELBO over time and judges convergence using a running average and median, as discussed a little bit e.g. here [1802.02538] Yes, but Did It Work?: Evaluating Variational Inference . Was it found to perform better, and that’s why it’s used instead? A citation or some justification would help us in writing up our results.

Thanks for your help,
Best wishes,
Martin

ferrine · November 6, 2021, 8:10pm

Hi, I’ve implemented the criterion long time ago. At the moment of writing, the motivation mostly came from deep learning perspective. I first considered loss as an metric to track for convergence, but signal to noise ratio was quite sensible. Then, I tried tracking parameters and found it’s easier to track them from a practical point of view. Since this worked better, I contributed them to pymc3.

Martin_Ingram · November 7, 2021, 12:38am

Thank you very much Maxim!

Topic		Replies	Views
Variational inference: diagnosing convergence	5	271	October 9, 2024
Sampling in ADVI v3 theano , modeling , sampling , pytensor	2	89	January 12, 2025
Tracking SVGD convergence Questions	2	509	February 2, 2020
Collecting ELBO from ADVI Questions	14	1456	April 5, 2018
# of ADVI samples before HMC; Sampling with correlated latent vars Questions	2	612	November 17, 2017

Justification for ADVI convergence criterion?

Related topics