Hello!
I’m writing to ask for advice/suggestions on creating a new operator class for operator VI in pymc3.
Specifically, I’d like to use the squared L2 norm of the difference between the score function of the posterior and the variational approximation:
\mathcal{L}(q,p)= \mathbb{E}_{\theta \sim q(\theta)}[\| \nabla_\theta \log p(\theta)p(x|\theta) - \nabla_\theta \log q(\theta)\|_2^2]
Where \theta are the free (latent) random variables for which I’d like to fit a variational approximation, and x are observed variables.
This has a similar flavor to the usual ELBO, \mathbb{E}_{\theta \sim q(\theta)}[ \log p(\theta)p(x|\theta) - \log q(\theta)], so I was hoping this would be straightforward to implement [modeling after “KL(Operator)” in variational/operators.py, and rewriting “init()” and “apply()” for the new class]. However, I’ve been getting lost trying to keep track of where the relevant variables are and properly computing the necessary gradients.
More specifically, a first pass was to set self.approx=approx
in __init__
, and in apply
compute the objective as
dlogp = tt.grad(self.approx.datalogp+self.varlogp, self.approx.symbolic_randoms)
dlogq = tt.grad(self.approx.logq, self.approx.symbolic_randoms)
score_diff = [dlogp_i - dlogq_i for (dlogp_i, dlogp_i) in zip(dlogp, dlogq)]
return sum(tt.sum(tensor**2) for tensor in score_diff)
I then tried to use this in essentially the same way that KL is used for ADVI. However, this gives me a “DisconnectedInputError” in the second line above, which I think may be due calling tt.grad on variables from different cloned (sub-)graphs.
Any suggestions on how to proceed with this would be much appreciated, even if just pointers to relevant documentation.
In case it’s relevant, I’ve used Theano in past but am new to pymc3 (I mostly have used Tensorflow probability and Stan).
Many thanks in advance!