I’m creating a very simple linear model with GLM:
with pm.Model() as linear_model:
pm.GLM.from_formula("score ~ C(myvar, Treatment)", df)
df
contains a very simple dataframe (with 5k samples) where the score
is a numeric value and myvar
is just a categorical variable. However, when I try to sample from it as in:
with linear_model:
samples = pm.sample(2000, tune=500)
It just keep stuck in this message:
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [sd, C(Q006, Treatment)[T.Q], C(Q006, Treatment)[T.P], C(Q006, Treatment)[T.O], C(Q006, Treatment)[T.N], C(Q006, Treatment)[T.M], C(Q006, Treatment)[T.L], C(Q006, Treatment)[T.K], C(Q006, Treatment)[T.J], C(Q006, Treatment)[T.I], C(Q006, Treatment)[T.H], C(Q006, Treatment)[T.G], C(Q006, Treatment)[T.F], C(Q006, Treatment)[T.E], C(Q006, Treatment)[T.D], C(Q006, Treatment)[T.C], C(Q006, Treatment)[T.B], Intercept] Sampling 2 chains: 0%| | 0/5000 [00:00<?, ?draws/s]
And doesn’t sample anything. What is even weird is that the python process is just idle and memory is always the same as well. Is this a known issue ? I can’t see anything wrong with it.
UPDATE: if I set cores=1
it then samples but it shows another error:
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `Q006[T.N]`.ravel()[0] is zero.
The derivative of RV `Q006[T.H]`.ravel()[0] is zero.
The derivative of RV `Q006[T.F]`.ravel()[0] is zero.
Which I suppose to be related to my problem, but it seems that there is definitely an issue with multiprocessing and GLM.