Simple linear model fails

I’m trying to run a simple linear model on the famous “diamonds” dataset, but having trouble with the HMC. What am I doing wrong?

import pymc3
import numpy as np
import pandas as pd

diamonds = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv"
)
diamonds = diamonds.sample(n=1_000_000, replace=True)

print(diamonds)


                                                                                                                                                             
#        carat        cut color clarity  depth  table  price     x     y     z                                                                                                                                                                                   
# 38867   0.40    Premium     F     VS1   61.4   58.0   1050  4.75  4.73  2.91                                                                                                                                                                                   
# 18067   2.01       Fair     F      I1   58.7   66.0   7294  8.30  8.19  4.84                                                                                                                                                                                   
# 1507    0.71  Very Good     F     VS2   59.6   56.0   2994  5.84  5.88  3.49                                                                                                                                                                                   
# 6618    0.90    Premium     H     VS2   60.7   58.0   4082  6.21  6.17  3.76                                                                                                                                                                                   
# 39269   0.38    Premium     G     VS1   61.9   58.0   1069  4.66  4.62  2.87                                                                                                                                                                                   
# ...      ...        ...   ...     ...    ...    ...    ...   ...   ...   ...                                                                                                                                                                                   
# 5749    0.98  Very Good     E     SI2   61.1   60.0   3895  6.31  6.36  3.87                                                                                                                                                                                   
# 20508   1.70    Premium     I     VS1   61.5   58.0   8840  7.74  7.64  4.73                                                                                                                                                                                   
# 52531   0.72      Ideal     I    VVS2   61.7   55.0   2530  5.71  5.76  3.54                                                                                                                                                                                   
# 12146   1.00       Good     E     SI1   63.7   60.0   5174  6.29  6.24  3.99                                                                                                                                                                                   
# 23398   0.36      Ideal     E     SI1   62.0   57.0    631  4.53  4.57  2.82                                                                                                                                                                                   
                                                                                   

model = pymc3.glm.GLM.from_formula(
    "price ~ C(cut) + C(color) + C(clarity) + carat + depth + table + x + z",
    data=diamonds,
)
fit = pymc3.sample(model=model, tune=20005)


# ValueError: Mass matrix contains zeros on the diagonal.
# The derivative of RV `z`.ravel()[0] is zero.
# The derivative of RV `x`.ravel()[0] is zero.
# The derivative of RV `table`.ravel()[0] is zero.
# The derivative of RV `depth`.ravel()[0] is zero.
# The derivative of RV `carat`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.VVS2]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.VVS1]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.VS2]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.VS1]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.SI2]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.SI1]`.ravel()[0] is zero.
# The derivative of RV `C(clarity)[T.IF]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.J]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.I]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.H]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.G]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.F]`.ravel()[0] is zero.
# The derivative of RV `C(color)[T.E]`.ravel()[0] is zero.
# The derivative of RV `C(cut)[T.Very Good]`.ravel()[0] is zero.
# The derivative of RV `C(cut)[T.Premium]`.ravel()[0] is zero.
# The derivative of RV `C(cut)[T.Ideal]`.ravel()[0] is zero.
# The derivative of RV `C(cut)[T.Good]`.ravel()[0] is zero.
# The derivative of RV `Intercept`.ravel()[0] is zero.
# """

Hi Mina,
Is your venv running with PyMC master? If yes, your issue is probably related to this one and you should find you answer there.
PS: I think the GLM module uses flat priors, which can also be a cause of the error you’re experiencing.
Hope this helps :vulcan_salute:

Changing the prior fixes the issue. Thanks.

1 Like