Justification for ADVI convergence criterion?

Hi everyone,

I’m currently working on a project involving variational inference, and as part of that, we’d like to compare a method against PyMC’s ADVI. Part of the challenge with ADVI is detecting convergence, and in PyMC, this is handled with a callback, as described in the Quickstart (Variational API quickstart — PyMC3 3.11.4 documentation), and the source code is here: pymc/callbacks.py at main · pymc-devs/pymc · GitHub . It appears that the way convergence is detected in PyMC is to compute the relative difference in the parameter vectors at fixed intervals. By default, the infinity norm is used, which is equivalent to finding the maximum difference. The default tolerance seems to be 10^{-3}.

What I would like to know is whether there is any justification for this convergence criterion. Is there any paper that argues for it? It seems that it is different from the one used in Stan, which instead computes the relative change in the ELBO over time and judges convergence using a running average and median, as discussed a little bit e.g. here [1802.02538] Yes, but Did It Work?: Evaluating Variational Inference . Was it found to perform better, and that’s why it’s used instead? A citation or some justification would help us in writing up our results.

Thanks for your help,
Best wishes,

Hi, I’ve implemented the criterion long time ago. At the moment of writing, the motivation mostly came from deep learning perspective. I first considered loss as an metric to track for convergence, but signal to noise ratio was quite sensible. Then, I tried tracking parameters and found it’s easier to track them from a practical point of view. Since this worked better, I contributed them to pymc3.


Thank you very much Maxim!