Variational API: meaning of parameters

I’m trying to understand what different fitting parameters mean, as it’s not completely clear from the API reference. Could anyone explain a bit please?
Namely, I’m confused with existence of two different optimizers - obj_optimizer and test_optimizer. Which one should I set to e.g. modify the optimization learning rate or something? And overall, the whole concept of test function is not clear - why the objective is not enough?
Another thing is obj_n_mc and tf_n_mc - why do we need monte-carlo to approximate the objective function gradient? As everything is written as theano functions, it should be automatically differentiable, right?

Hi, I was led by OPVI paper (arXiv) when implementing and unifying VI in PyMC3. This theoretical framework assumes Objective function that is minimized and Test function that is maximized. They also proposed a novel approach for implicit VI but I did not implement it as found it not that promising. I believe that framework is very useful and decided not to change the things leaving free space for new methods in future. One can pick up OPVI and implement his own approach, I hope there is enough flexibility.

obj_n_mc – number of monte carlo samples to estimate objective function gradient
tf_n_mc – number of monte carlo samples to estimate test function gradient
test_optimizer – optimizer for test function
obj_optimizer – optimizer for objective function

All KL based methods we have rely on obj_function itself, test function for them is identity, and thus not involved in VI.
By the way we have one method that I could reformulate as OPVI special case. It is recently proposed Stein variational gradient descent (arXiv) that has nonparametric test function. As it has analytical optimum it does not require any optimization.

Finally, we do not yet need parametric test function in PyMC3 but this may change in future

2 Likes

So, this means I only change obj_optimizer to change e.g. the learning rate, right?
And as for obj_n_mc - I saw the definition “number of monte carlo samples to estimate objective function gradient” in the reference, but it does not shed much light. I mean, why do we need to estimate the gradient at all? Theano provides the exact expression for it.

Variational inference use ELBO as the objective function, which is an expectation over some space. Computation of expectation is difficult because it is a high dimensional integral. The parameterization trick as usually used in PyMC3 and mainstream package relied on using samples to compute the expectation; and using expectation of the gradient to substitute computing the gradient of the expectation. You can find some good reference in https://arxiv.org/pdf/1610.02287.pdf (see equation 1 and 3), and a more recent paper https://arxiv.org/pdf/1805.08498.pdf (also see eq 1 and 3)

Oh, now I see, thank you!