ADVI giving NaN values to Elbo

I am trying to use ADVI on radial velocity data from this paper which specifies the Bayesian Model. I need to find a parameter E such that it satisfies Kepler’s equation M = E - e * sinE where M and e are parameters defined earlier in the model. I know the gradients for E as dE/dm = 1/(1-e*cosE) and dE/de = sinE * dE/dm
I am trying to make a custom distribution in pyMC3 for the same but havent been able to so far. My elbo keeps getting NaN values which I suspect are because of the model not using the gradients of E. In my model E is again transformed to a variable f by using arctan2. Refer the paper for more details on exact nature of my parameters.

def computeE(M,e):
    E0, E = M,M
    for i in range(200):
        g = E0 - e*t.sin(E0) - M
        gp = 1 - e*t.cos(E0)
        E = E - g / gp
    computeE.grad = lambda inputs,out_grad: [1/(1+inputs[1]*t.cos(E)),t.sin(E)/(1+inputs[1]*t.cos(E))]
    return E
    
    
with one_planet_model:
    V = pm.Uniform('V',lower=-2000,upper=2000)
    e = pm.Uniform('e',lower=0,upper=1)
    pomega = pm.Uniform('pomega',lower=0,upper=2*np.pi)
    chi = pm.Uniform('chi',lower=0,upper=1)
    T = pm.DensityDist('T',lambda value: - np.log(value) - np.log(np.log(15000 * 5)))
    K = pm.DensityDist('K',lambda value: -np.log(value+1)-np.log(2001))
    s = pm.DensityDist('s',lambda value: -np.log(value+1)-np.log(2001))
    sd = np.sqrt(err1**2 + s**2)
    M = 2 * np.pi * ((jd / T + chi) % 1)

    E = computeE(M,e)
    f = pm.Deterministic('f', 2 * t.arctan2(np.sqrt(1 + e) * np.sin(E / 2), np.sqrt(1 - e) * np.cos(E / 2)))
    mu = V - K * (np.sin(f + pomega) + e * np.sin(pomega))
    rv_obs = pm.Normal('rv_obs',mu=mu,sd=sd,observed=rv1)

I also have a theano Op as a class but dont know how to use it in my model. Any help will be much appreciated.

class myOp(theano.Op):
    __prop__ = ()
    itypes=[t.dvector, t.dscalar]
    otypes=[t.dvector]
    def perform(self, node, inputs, outputs):
        M,e = inputs
        E0, E = M,M
        for i in range(200):
            g = E0 - e*t.sin(E0) - M
            gp = 1 - e*t.cos(E0)
            E = E - g / gp
        outputs = E
    
    def grad(self, inputs, output_gradients):
        M,e = inputs
        E0, E = M,M
        for i in range(200):
            g = E0 - e*t.sin(E0) - M
            gp = 1 - e*t.cos(E0)
            E = E - g / gp
        dE_dm = 1/(1-e*t.cos(E))
        dE_de = t.sin(E)*dE_dm
        return [dE_dm*output_gradients[0],dE_de*output_gradients[1]]

@aseyboldt implemented it something very similar (he is trying to write a blog post on this). You can have a look at his Gist for some inspiration.

The notebook is a pymc3 port of this blog post: http://dan.iel.fm/posts/stan-c++/
I didn’t really get advi initialization working either, so I got the scaling from previous nuts chains.
About your implementation:
The op is the right idea, but grad works differently than perform. perform gives you numpy arrays, so you can do arbitrary operations in there. (small bug: you need to set the content of output, or the caller can’t access your result).
grad on the other hand needs to return theano vars. You often need the value in the gradient, you can get that by using val = self(*inputs)

That gist did help a lot. I will finish the implementation over the next
few days. Thanks for sharing and the explanation. It will be nice if you
can explain how custom ops work with pymc3.

As for advi, i wanted to try parameter estimation using advi and see what
results I get. You didn’t get it working even with the op?

There is a section about writing custom ops in the pymc3 doc:
https://pymc-devs.github.io/pymc3/notebooks/getting_started.html#Arbitrary-deterministics
And a longer version in the theano doc:
http://deeplearning.net/software/theano/extending/extending_theano.html

I realize this isn’t entirely easy to follow, but I can’t really improve on that in a couple of lines here. If you have concrete questions that would be easier to answer.

About advi: I ran into nan values near the start of the advi run, and the opimizer stops if that happens at the moment. I think the reason is that the Kepler equation had very strange solutions for some parameter combinations just after the optimization started. I didn’t try hard to get around this however, maybe it would have been enough to just decrease the learning rate.

I think your notebook along with the documentation you mentioned should
resolve my issue.

But I think to someone less well acquainted with theano this could be a big
roadblock as making a custom op and using in pymc3 is not very
straightforward. What I meant was in your blog post if you could detail out
the custom Ops it will be helpful. It is confusing to use when one doesn’t
understand what each function is doing. I will also try to contribute an
example with details.

Thank you for your help nonetheless.