Hi, I’m relatively new to pymc3 but hope someone can help. I have a very large model. When I run it for most of my datasets, it works fine and samples without problems.
However, with a few of my datasets, it works intermittently. It sometimes finishes successfully like the others, but often crashes with the “ValueError: Mass matrix contains zeros on the diagonal” error. It will also say:
The derivative of RV `myvar_log__`.ravel() is zero.
for a bunch of the variables.
I’ve read a few threads here about this error, but I haven’t yet seen an explanation or a way of figuring out the cause of it. I know cause of the problem can probably be specific to my model, so I’m wondering about a general way of solving this: what does the error actually mean, and where can I look to debug it?
any tips are appreciated, thank you.
You can find some pointer here: Frequently Asked Questions
Hi @junpenglao ,
thanks for the response. I read that thread. It appears to be giving a way to identify the RV causing the trouble, but I already know which it is (it’s in the error above).
Also, it seems like the error he had was a “bad initial energy: nan”, whereas I have the one with “derivative of RV is zero”. Are those actually the same ones?
I also have tried replacing ‘jitter+adapt_diag’ with just ‘adapt_diag’, but no change.
What’s strange is that I run the same model with several data sets, and they all work except for this one.
What steps can I try to take to diagnose the problem?
Yeah I think usually these errors relate to the same problems – NaN values or zeros somewhere, making the log of these values unidentifed (if I remember correctly).
It’s hard to say without looking at the model, but have standardized your predictors / outcome variables? If they are on large scales, it can mess up sampling in exactly that way.
yeah, unfortunately the model is very large at this point, fitting to a bunch of data, so it’s hard to provide a MWE
However, one clue is that it works for all the other data sets, but for the one that it fails with, that dataset doesn’t have one of the observables (i.e., a “column” of the data) that the other ones all do. So, it’s underconstrained. My current theory is that because it’s underconstrained, the “energy landscape” might be really flat there, leading to 0 derivative?
What do you mean by standardized, do you mean normalizing them so that they have mean ~ 0 and SD ~ 1 or something?
So I did kinda get it to work, though it’s definitely hack-y. I used the
start arg of
pm.sample(), and use the last trace point of a previous successful run (with the same dataset), and it works. But that kinda depends on getting a successful run at some point…
Yeah exactly – that can really do magic!
Other than that, it’s difficult for me to say anything else without the model.
Hope this helps anyway, and good luck!
ah yeah. I know that’s a standard for features is deep learning, but I hadn’t heard of using it here… it works well for MCMC too?
Definitely works like a charm – should be the default behavior!