I’m trying to run a model on an AWS instance running Redhat and I keep getting an error “Bad initial energy: nan. The model might be misspecified.” The same model runs successfully on my local Ubuntu desktop. For simplicity, I also tried running the extremely simple
import pymc3 as pm
with pm.Model() as model:
p = pm.Uniform('p', lower = 0.0, upper = 1.0)
b = pm.Binomial('b', 100, p, observed = 10)
with model:
start = pm.find_MAP(model = model)
trace = pm.sample(10000, start = start)
and I get the same behavior: it works on my Ubuntu desktop and it fails on AWS.
The versions of NumPy (1.133), SciPy (1.0.0) , Theano (0.9.0), and PyMC3 (3.2) are the same in both machines.
Taking a look at step_methods/hmc/nuts.py it seems like at some point the call to self.potential.random() at line 177 returns [inf] thus leading to infinite energy. This only happens in the AWS machine.
I’m completely unsure of how to proceed here. Any help would be appreciated. Thank you.
what is the theano float type and int type on your EC2? With such a simple model the only possibility I can think of is that p was rounded down to zero…
I realized my original model (more complex) also stopped working on my desktop after I upgraded PyMC3 to try and reproduce the EC2 behavior (the simple model described above does work). I’m not sure which version I was running, but it had been a few months since I downloaded it. I downgraded PyMC3 to 3.1 with pip install -I --no-cache-dir 'pymc3<3.2' --no-deps and now both the simple model and the complex model work on both my laptop and EC2. This will work for my needs, but it’s probably worth investigating what’s going on
I was getting the same error yesterday. I am trying out multinomial regression with continuous predictors. I think it has something to do with using NUTS on a discrete output. But I am not sure about this, so I’d love to hear from the pros about it. I switched to Metropolis steps and am getting mostly reasonable results.
I have an implementation of softmax regression with multinominal observed and it runs fine for me. How do you deal with the continuous predictor? If you also do a softmax to make p unitary you should try to restrict one column to zero or set all the priors to N(0, 1)
Yeah the softmax often makes the model unidentifiable, because the scaling does not plays a role anymore. In my problem, setting a Normal(0, 1) prior works well (I might even get rid of the HalfCauchy). One of the other solution is to restrict one of the column being zero in mu.
The model you are running now seems fine - did you try that with the default (doing just trace = pm.sample(1000, tune=1000, njobs=4) for example)
In that case, it is usually an indication there are some latent problem somewhere - would be helpful if you post the real model with some simulation code.
I’ll need to do some rewriting before sharing it here, but sure. However, can you think of any reason why it would work with PyMC 3.1 and not PyMC 3.2? I think the main point here is this difference, which is also observed with the extremely simple model above on AWS: version 3.1 works while 3.2 does not