How to track a 'nan energy'?

I’m building a complex model, that have a convergence problem, when initializing with NUTS I get this error message :

ValueError: Bad initial energy: nan. The model might be misspecified.

How can I have clues about the random variable that produce such ‘nan’ energy ?

So far my work around is to use Metropolis step for a given variable instead of NUTS. But this is not satisfying. I guess that this given variable is not the problem because I still have the problem even if I replace this random variable by a constant (with a reasonable value).

I printed the logp value of the test point for each of my random variables :

original_ctr_logodds__ -1.3862943611198906
ctr_logodds__ -138.62943611198918
ctr_prior_interval__ -794407.2905335968
w_stickbreaking__ -5.963065072586285
ctr_pp 207.9293698687659
original_ctr_pp 2.079293698687664
original_conv -31871.982096691176
conv -3180114.173934952
original_conv_pp -31871.98209669117
conv_pp -3180114.173934952
-7218315.572728861

(last value is the whole model logp)

None of them are Inf or Nan, I’m a bit surprised to see positive logp but I’m not sure that it is a problem (?).

How can I get more information about this ‘nan ernergy’ ?

The error message should be improved recently, are you on master? It should print a bit more information re which RV is nan.

I’ve updated pymc3 with this line :

pip install git+https://github.com/pymc-devs/pymc3

(I guess that is the “master” ?)

But still get no clues about wich RV gets mad…

The energy problem, if it is not from invalid start value (i.e., model.test_point) causing non-finite logp, it is usually due to gradient being non-finite. It could be difficult to diagnose, so here would be all the possible step to identify the problem:

with pm.Model() as model:
    # your model definition

# make sure all test_value are finite
print(model.test_point)

# make sure all logp are finite
model.check_test_point()

with model:
    step = pm.HamiltonianMC()

q0 = step._logp_dlogp_func.dict_to_array(model.test_point)
p0 = step.potential.random()
# make sure the potentials are all finite
print(p0)

start = step.integrator.compute_state(q0, p0)
print(start.energy)

# make sure model logp and its gradients are finite
logp, dlogp = step.integrator._logp_dlogp_func(q0)
print(logp)
print(dlogp)

# make sure velocity is finite
v = step.integrator._potential.velocity(p0)
print(v)
kinetic = step.integrator._potential.energy(p0, velocity=v)
print(kinetic)

Any time you see an array containing non-finite element, you can map it back into a dict to see which RV is causing the problem. For example, say the dlogp contain non-finite value:

step._logp_dlogp_func.array_to_dict(dlogp)

And adjust the prior for that RV accordingly.

Hope this is clear!

4 Likes

Thanks for all these tricks…

I found no divergent parameters…

The error disapeared when I switched from defaut initializer : ‘jitter+adapt_diag’
to ‘adapt_diag’ (alone, without the jitter)

I guess that the jitter trick somehow pushes some parameter outside they legitimate support…?

Does it make sens ?

Yep, make sense - that can happens.

1 Like