How to track a 'nan energy'?


#1

I’m building a complex model, that have a convergence problem, when initializing with NUTS I get this error message :

ValueError: Bad initial energy: nan. The model might be misspecified.

How can I have clues about the random variable that produce such ‘nan’ energy ?

So far my work around is to use Metropolis step for a given variable instead of NUTS. But this is not satisfying. I guess that this given variable is not the problem because I still have the problem even if I replace this random variable by a constant (with a reasonable value).

I printed the logp value of the test point for each of my random variables :

original_ctr_logodds__ -1.3862943611198906
ctr_logodds__ -138.62943611198918
ctr_prior_interval__ -794407.2905335968
w_stickbreaking__ -5.963065072586285
ctr_pp 207.9293698687659
original_ctr_pp 2.079293698687664
original_conv -31871.982096691176
conv -3180114.173934952
original_conv_pp -31871.98209669117
conv_pp -3180114.173934952
-7218315.572728861

(last value is the whole model logp)

None of them are Inf or Nan, I’m a bit surprised to see positive logp but I’m not sure that it is a problem (?).

How can I get more information about this ‘nan ernergy’ ?


#2

The error message should be improved recently, are you on master? It should print a bit more information re which RV is nan.


#3

I’ve updated pymc3 with this line :

pip install git+https://github.com/pymc-devs/pymc3

(I guess that is the “master” ?)

But still get no clues about wich RV gets mad…


#4

The energy problem, if it is not from invalid start value (i.e., model.test_point) causing non-finite logp, it is usually due to gradient being non-finite. It could be difficult to diagnose, so here would be all the possible step to identify the problem:

with pm.Model() as model:
    # your model definition

# make sure all test_value are finite
print(model.test_point)

# make sure all logp are finite
model.check_test_point()

with model:
    step = pm.HamiltonianMC()

q0 = step._logp_dlogp_func.dict_to_array(model.test_point)
p0 = step.potential.random()
# make sure the potentials are all finite
print(p0)

start = step.integrator.compute_state(q0, p0)
print(start.energy)

# make sure model logp and its gradients are finite
logp, dlogp = step.integrator._logp_dlogp_func(q0)
print(logp)
print(dlogp)

# make sure velocity is finite
v = step.integrator._potential.velocity(p0)
print(v)
kinetic = step.integrator._potential.energy(p0, velocity=v)
print(kinetic)

Any time you see an array containing non-finite element, you can map it back into a dict to see which RV is causing the problem. For example, say the dlogp contain non-finite value:

step._logp_dlogp_func.array_to_dict(dlogp)

And adjust the prior for that RV accordingly.

Hope this is clear!


Frequently Asked Questions
Large Feature Number Dirichlet
#5

Thanks for all these tricks…

I found no divergent parameters…

The error disapeared when I switched from defaut initializer : ‘jitter+adapt_diag’
to ‘adapt_diag’ (alone, without the jitter)

I guess that the jitter trick somehow pushes some parameter outside they legitimate support…?

Does it make sens ?


#6

Yep, make sense - that can happens.