The current approximation of RV "x" is Nan

I am trying to use SVGD in pymc3 on a model to infer parameters of an ODE model. Whenever, I set no.of particles greater than 10 or increase the number of iterations to 20000 instead of the default 10000 I get the below error after completing about 90% of iterations.

FloatingPointError: NaN occurred in optimization. 
The current approximation of RV `tau0_interval__`.ravel()[0] is NaN.
The current approximation of RV `tau0_interval__`.ravel()[0] is NaN.
The current approximation of RV `tau0_interval__`.ravel()[0] is NaN.
The current approximation of RV `tau0_interval__`.ravel()[0] is NaN.
The current approximation of RV `tau0_interval__`.ravel()[0] is NaN.
...

I haven’t found any other threads reporting similar errors. I did find a few threads with the error Nan occured in optimization however they all seem to get this error in the first iteration which were addressed with better initial values unlike in my case where the error is thrown in the last 10% of iterations.

Any suggestions or pointers on what could cause this error?

Thanks!

My guess is that the approximation is near (local-)optimal after some iteration. Try setting the iterations to a smaller number (so that it wont crash), and plot the approx.hist and the parameter values.

Thank you for the suggestion! I did try using smaller number of iterations(8000), however, I observed that sometimes it completes all 8000 successfully and sometimes it crashes. Based on your guess, I suppose this could be happen if the optimizer reaches the local optima faster sometimes, is that possible with SVGD (initialized manually)?

And if so I would like to run 1000 iterations at a time so that I atleast have the state of the optimizer before the crash. Is this possible with pymc3 (I mean saving the state of the optimizer and resuming it given the last state)?

That’s likely - as SVGD the particles are sampled randomly

It’s not easy to save the state, but you can stop the optimization and then continue again:

approx = advi.fit(1000)
advi.refine(10000) # train for additional 10000 iteration