Hi everyone,
I’m currently working on a project involving variational inference, and as part of that, we’d like to compare a method against PyMC’s ADVI. Part of the challenge with ADVI is detecting convergence, and in PyMC, this is handled with a callback, as described in the Quickstart (Variational API quickstart — PyMC3 3.11.4 documentation), and the source code is here: pymc/callbacks.py at main · pymc-devs/pymc · GitHub . It appears that the way convergence is detected in PyMC is to compute the relative difference in the parameter vectors at fixed intervals. By default, the infinity norm is used, which is equivalent to finding the maximum difference. The default tolerance seems to be 10^{-3}.
What I would like to know is whether there is any justification for this convergence criterion. Is there any paper that argues for it? It seems that it is different from the one used in Stan, which instead computes the relative change in the ELBO over time and judges convergence using a running average and median, as discussed a little bit e.g. here [1802.02538] Yes, but Did It Work?: Evaluating Variational Inference . Was it found to perform better, and that’s why it’s used instead? A citation or some justification would help us in writing up our results.
Thanks for your help,
Best wishes,
Martin