I wanted to try replacing ADVI with SVGD in my model, but there seems to be no easy way to track its convergence. I suppose I would have to look at the gradient corresponding to each particle (?) - but how do I extract this information in a callback passed to the fit function?
The implementation is a bit hacky, but I can help you to implement your own
from itertools import count
from collections import defaultdict
# pm.callbacks.Tracker
# uses key->callable
# to record statistic
# the below function takes gradients for the params
# and creates the callable with `eval` method of a tensor
@theano.configparser.change_flags(compute_test_value='off')
def get_tracker(inference):
numbers = defaultdict(count)
params = inference.approx.params
grads = tt.grad(inference.objective(None), params)
names = ['%s_%d' % (v.name, next(numbers[v.name])) for v in inference.approx.params]
return pm.callbacks.Tracker(**OrderedDict(
[(name, v.eval) for name, v in zip(names, params)] + [('grad_' + name, v.eval) for name, v in zip(names, grads)]
))
However, I have doubts that this is the best way to track convergence. I think one should use Earth Moving Distance between particles. Optimization often involves momentum term and can cause spurious oscilations. This is a hand waving argument you can ignore. But this optimization problem has an infinite amount of optimums, so optimizer has hard times telling it’s converged