# Issues with Poisson Regression using GLM method

Hi,

I was following an example described in https://docs.pymc.io/notebooks/GLM-poisson-regression.html.

This is how the dataset looks like:

``````   numvisit	badh	age
1	30	0	58
2	20	0	54
3	16	0	44
4	20	0	57
5	15	0	33
``````

The code is super simple:

``````with pm.Model() as model:
pm.glm.GLM.from_formula("numvisit ~ badh + age", df, family=pm.glm.families.Poisson())
``````

After sampling from the trace, I am getting the â€śValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.â€ť After browsing other questions related to this subject, I printed my test values which might suggest an overflow. Although I am not convinced in that.

``````Intercept 0.0
age -7.82669381218681
mu_log__ -2.7641181592074506
y -3838.248213052888
``````

After attempting to fix the problem with log(y) instead of y, I have got a different issue.â€śBad initial energy: inf. The model might be misspecified.â€ť I suspect this is due to the fact that some values are 0s and then the logs getâ€™s messed up. I printed my test values and they support the assumption:

``````Intercept 0.0
age -7.82669381218681
mu_log__ -2.7641181592074506
y -inf
``````

I went with the more native PyMC3 approach, and implemented the model as follows:

``````with pm.Model() as model1:
intercept = pm.Normal("intercept", mu=0, sd=2, testval=0)
b_age = pm.Normal("b_age", mu=0, sd=2, testval=0)
numvisits = pm.Poisson("numvisits", mu=np.exp(log_lam), observed=df.numvisit.values)
``````

And here I am back with the â€śValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.â€ť

What other ideas can I try here to troubleshoot the issue. Note, I have experimented with various ranges of sd (from 1 to 1e4), also experimented with tau instead of sd (ranges 0.1 to 0.0001) and multiple test values.

I will appreciate hints on further troubleshooting. Thank you.

With the raw input data range, you quickly ran into overflow problem. For example, if the coefficients are 1s (which is quite reasonable given the priors), you have a prediction of the first row being `np.exp(50)` --> 5.184705528587072e+21

A usual treatment is to standardize or scales your input to a more reasonable range, for example, divide the age by 100.

This is great! I added the following code to make this work according to the suggestion:

``````scaler = StandardScaler()
scaler.fit(df)
scaled_df = pd.DataFrame(scaler.transform(df), columns=df.columns)
# reset the binary column