Metropolis sampler do not sample (traceplot gives flat plots)

VLavorini · June 12, 2018, 3:39pm

Hi!

I’m trying to implement logistic regression. The problem circa the same of this tutorial, and the dataset is a modified version of the one named. I do not know how is modified.

Here is the code that I run:

data['ages']=np.square(data['age'])

with pm.Model() as logistic_model:
    pm.glm.GLM.from_formula('income_more_50K ~  age  + ages + hours + educ', data, family=pm.glm.families.Binomial())
    start=pm.find_MAP()
    sampler=pm.Metropolis()

    trace_logistic_model = pm.sample(400, step=sampler, start=start, init='adapt_diag')

The point is: without ages (i.e. age squared) added to the model, PyMC runs as expected, and I can get the plots of traceplot. But when I add ages and call traceplot, I get completely flat figures, as you can see below.

Given that the original tutorial, with the original dataset, everything works as expected, there should be something in the data that break PyMC3. But I am not able to find it.

Can someone give me an hint?

Thank you!

junpenglao · June 12, 2018, 3:54pm

You can call logistic_model.check_test_point() to see if there is any random variable that gives nan or inf logp.
Otherwise, it is the problem of Metropolis stuck on the initial value. The recommandation is to sample using the default setting which in most case take care of these problems:

trace_logistic_model = pm.sample(1000, tune=2000)

VLavorini · June 14, 2018, 8:30am

I tried to call logistic_model.test_point(), since the check_test_point method is not present in this version (3.4.1). But the output is TypeError: 'dict' object is not callable.

While if I use the NUTS sampler (even if in this exercise I have shoud use the Metropolis one ), the sampling do not start at all, and I get:
ValueError: Bad initial energy: inf. The model might be misspecified.

rlouf · June 14, 2018, 8:51am

If I were you I would dig into the data to see if it could cause overflows or underflows when computing the likelihood. In this case, ages. I’d suggest renormalizing each favor by substracting its mean value and dividing by the standard deviation.

junpenglao · June 14, 2018, 9:54am

.test_point is a different thing - it prints out the default values. For the equvalent of check_test_point() you need to do:

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

VLavorini · June 14, 2018, 11:06am

ok, the output of your suggestion is:

Intercept 0.0
age -7.82669381218681
ages -7.82669381218681
hours -7.82669381218681
educ -7.82669381218681
y -15638.093540616184

all the same.

But more: now not even the copy-paste of the code in here works (I got the error about the bad initial energy).

I tried to delete the Theano cache, but nothing changes.

Is there any kind of cache of PyMC that I can flush?

junpenglao · June 14, 2018, 11:24am

As @rlouf mentioned, this is quite possible problem of overflows, as ages^2 makes the linear prediction quite big.
You can try specifying a more informative prior:

pm.glm.GLM.from_formula('income_more_50K ~  age  + ages + hours + educ', data, 
    family=pm.glm.families.Binomial(), 
    priors=dict(ages=pm.Normal.dist(mu=0, sd=2.5))

VLavorini · June 14, 2018, 4:27pm

I found the source of the issue: the MAP estimation.

If I remove the start parameter in pm.sample():

the NUTS sampler do not give the ValueError: Bad initial energy: inf. The model might be misspecified. anymore (YAY!)
the Metropolis-Hastings sampler actually samples, and traceplot show the samples

So, in some way, the MAP gives some kind of unacceptable value to the parameter of ages (the age squared)?
Is it a bug?

rlouf · June 15, 2018, 6:08am

Well, that’s great! Do you get reasonable samples? Do you get any warning from NUTS?

Now that you mention it, I’ve had issues with MAP initialization in the past which is why—correct me if I’m wrong—it is not the default initialization anymore. @junpenglao will know better.

Ps: It might still be a problem of overflows. Let me have a look in the codebase.

rlouf · June 15, 2018, 6:18am

Could you post the output of pm.find_MAP() here?

junpenglao · June 15, 2018, 6:25am

Most likely some initial value (the coefficient for age^2 in this case) is a very unlikely value, which makes the gradient of the RV very small (think about a value at the far tail of the gaussian). The optimiser thus having issue converging as it terminated early mistakenly. Some specific optimiser are said to deal with this kind of problem.

And yes, we dont recommend using MAP as initialisation - the gradient around MAP is also very small which can make sampling also problematic.

VLavorini · June 15, 2018, 11:46am

This is the output of MAP:

{'Intercept': array(0.00488243),
 'age': array(0.11477027),
 'ages': array(2.49833873),
 'educ': array(0.05535978),
 'hours': array(0.23443527)}

The coefficient associated to ages is bigger w.r.t. the others, but is not a NaN or something similar…

Topic		Replies	Views
Traceplot acceptability Questions	5	908	May 6, 2018
Issues sampling linear models Questions	2	373	April 14, 2020
Initial evaluation of model at starting point failed when running a mixture model Questions bug	2	1201	October 8, 2021
Trying to sample the Vanderplas model from the docs Questions	4	647	March 23, 2018
PyMC3 traceplot not displaying Questions	2	2035	November 21, 2017

Metropolis sampler do not sample (traceplot gives flat plots)

Related topics