I’m trying to implement logistic regression. The problem circa the same of this tutorial, and the dataset is a modified version of the one named. I do not know how is modified.
Here is the code that I run:
data['ages']=np.square(data['age'])
with pm.Model() as logistic_model:
pm.glm.GLM.from_formula('income_more_50K ~ age + ages + hours + educ', data, family=pm.glm.families.Binomial())
start=pm.find_MAP()
sampler=pm.Metropolis()
trace_logistic_model = pm.sample(400, step=sampler, start=start, init='adapt_diag')
The point is: without ages (i.e. age squared) added to the model, PyMC runs as expected, and I can get the plots of traceplot. But when I add ages and call traceplot, I get completely flat figures, as you can see below.
Given that the original tutorial, with the original dataset, everything works as expected, there should be something in the data that break PyMC3. But I am not able to find it.
You can call logistic_model.check_test_point() to see if there is any random variable that gives nan or inf logp.
Otherwise, it is the problem of Metropolis stuck on the initial value. The recommandation is to sample using the default setting which in most case take care of these problems:
I tried to call logistic_model.test_point(), since the check_test_point method is not present in this version (3.4.1). But the output is TypeError: 'dict' object is not callable.
While if I use the NUTS sampler (even if in this exercise I have shoud use the Metropolis one ), the sampling do not start at all, and I get: ValueError: Bad initial energy: inf. The model might be misspecified.
If I were you I would dig into the data to see if it could cause overflows or underflows when computing the likelihood. In this case, ages. I’d suggest renormalizing each favor by substracting its mean value and dividing by the standard deviation.
As @rlouf mentioned, this is quite possible problem of overflows, as ages^2 makes the linear prediction quite big.
You can try specifying a more informative prior:
Well, that’s great! Do you get reasonable samples? Do you get any warning from NUTS?
Now that you mention it, I’ve had issues with MAP initialization in the past which is why—correct me if I’m wrong—it is not the default initialization anymore. @junpenglao will know better.
Ps: It might still be a problem of overflows. Let me have a look in the codebase.
Most likely some initial value (the coefficient for age^2 in this case) is a very unlikely value, which makes the gradient of the RV very small (think about a value at the far tail of the gaussian). The optimiser thus having issue converging as it terminated early mistakenly. Some specific optimiser are said to deal with this kind of problem.
And yes, we dont recommend using MAP as initialisation - the gradient around MAP is also very small which can make sampling also problematic.