Metropolis sampler do not sample (traceplot gives flat plots)

Hi!

I’m trying to implement logistic regression. The problem circa the same of this tutorial, and the dataset is a modified version of the one named. I do not know how is modified.

Here is the code that I run:

data['ages']=np.square(data['age'])

with pm.Model() as logistic_model:
    pm.glm.GLM.from_formula('income_more_50K ~  age  + ages + hours + educ', data, family=pm.glm.families.Binomial())
    start=pm.find_MAP()
    sampler=pm.Metropolis()

    trace_logistic_model = pm.sample(400, step=sampler, start=start, init='adapt_diag')

The point is: without ages (i.e. age squared) added to the model, PyMC runs as expected, and I can get the plots of traceplot. But when I add ages and call traceplot, I get completely flat figures, as you can see below.

Given that the original tutorial, with the original dataset, everything works as expected, there should be something in the data that break PyMC3. But I am not able to find it.

Can someone give me an hint?

Thank you!

You can call logistic_model.check_test_point() to see if there is any random variable that gives nan or inf logp.
Otherwise, it is the problem of Metropolis stuck on the initial value. The recommandation is to sample using the default setting which in most case take care of these problems:

trace_logistic_model = pm.sample(1000, tune=2000)
1 Like

I tried to call logistic_model.test_point(), since the check_test_point method is not present in this version (3.4.1). But the output is TypeError: 'dict' object is not callable.

While if I use the NUTS sampler (even if in this exercise I have shoud use the Metropolis one ), the sampling do not start at all, and I get:
ValueError: Bad initial energy: inf. The model might be misspecified.

If I were you I would dig into the data to see if it could cause overflows or underflows when computing the likelihood. In this case, ages. I’d suggest renormalizing each favor by substracting its mean value and dividing by the standard deviation.

.test_point is a different thing - it prints out the default values. For the equvalent of check_test_point() you need to do:

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

ok, the output of your suggestion is:

Intercept 0.0
age -7.82669381218681
ages -7.82669381218681
hours -7.82669381218681
educ -7.82669381218681
y -15638.093540616184

all the same.

But more: now not even the copy-paste of the code in here works (I got the error about the bad initial energy).

I tried to delete the Theano cache, but nothing changes.

Is there any kind of cache of PyMC that I can flush?

As @rlouf mentioned, this is quite possible problem of overflows, as ages^2 makes the linear prediction quite big.
You can try specifying a more informative prior:

pm.glm.GLM.from_formula('income_more_50K ~  age  + ages + hours + educ', data, 
    family=pm.glm.families.Binomial(), 
    priors=dict(ages=pm.Normal.dist(mu=0, sd=2.5))

I found the source of the issue: the MAP estimation.

If I remove the start parameter in pm.sample():

  1. the NUTS sampler do not give the ValueError: Bad initial energy: inf. The model might be misspecified. anymore (YAY!)
  2. the Metropolis-Hastings sampler actually samples, and traceplot show the samples

So, in some way, the MAP gives some kind of unacceptable value to the parameter of ages (the age squared)?
Is it a bug?

Well, that’s great! Do you get reasonable samples? Do you get any warning from NUTS?

Now that you mention it, I’ve had issues with MAP initialization in the past which is why—correct me if I’m wrong—it is not the default initialization anymore. @junpenglao will know better.

Ps: It might still be a problem of overflows. Let me have a look in the codebase.

Could you post the output of pm.find_MAP() here?

Most likely some initial value (the coefficient for age^2 in this case) is a very unlikely value, which makes the gradient of the RV very small (think about a value at the far tail of the gaussian). The optimiser thus having issue converging as it terminated early mistakenly. Some specific optimiser are said to deal with this kind of problem.

And yes, we dont recommend using MAP as initialisation - the gradient around MAP is also very small which can make sampling also problematic.

This is the output of MAP:

{'Intercept': array(0.00488243),
 'age': array(0.11477027),
 'ages': array(2.49833873),
 'educ': array(0.05535978),
 'hours': array(0.23443527)}

The coefficient associated to ages is bigger w.r.t. the others, but is not a NaN or something similar…