"ValueError: Mass matrix contains zeros on the diagonal" error, how to debug?

Hi, I’m relatively new to pymc3 but hope someone can help. I have a very large model. When I run it for most of my datasets, it works fine and samples without problems.

However, with a few of my datasets, it works intermittently. It sometimes finishes successfully like the others, but often crashes with the “ValueError: Mass matrix contains zeros on the diagonal” error. It will also say:

The derivative of RV `myvar_log__`.ravel()[0] is zero.

for a bunch of the variables.

I’ve read a few threads here about this error, but I haven’t yet seen an explanation or a way of figuring out the cause of it. I know cause of the problem can probably be specific to my model, so I’m wondering about a general way of solving this: what does the error actually mean, and where can I look to debug it?

any tips are appreciated, thank you.

You can find some pointer here: Frequently Asked Questions

Hi @junpenglao ,

thanks for the response. I read that thread. It appears to be giving a way to identify the RV causing the trouble, but I already know which it is (it’s in the error above).

Also, it seems like the error he had was a “bad initial energy: nan”, whereas I have the one with “derivative of RV is zero”. Are those actually the same ones?

I also have tried replacing ‘jitter+adapt_diag’ with just ‘adapt_diag’, but no change.

What’s strange is that I run the same model with several data sets, and they all work except for this one.

What steps can I try to take to diagnose the problem?

thank you!

Yeah I think usually these errors relate to the same problems – NaN values or zeros somewhere, making the log of these values unidentifed (if I remember correctly).

It’s hard to say without looking at the model, but have standardized your predictors / outcome variables? If they are on large scales, it can mess up sampling in exactly that way.


yeah, unfortunately the model is very large at this point, fitting to a bunch of data, so it’s hard to provide a MWE :frowning:

However, one clue is that it works for all the other data sets, but for the one that it fails with, that dataset doesn’t have one of the observables (i.e., a “column” of the data) that the other ones all do. So, it’s underconstrained. My current theory is that because it’s underconstrained, the “energy landscape” might be really flat there, leading to 0 derivative?

What do you mean by standardized, do you mean normalizing them so that they have mean ~ 0 and SD ~ 1 or something?

So I did kinda get it to work, though it’s definitely hack-y. I used the start arg of pm.sample(), and use the last trace point of a previous successful run (with the same dataset), and it works. But that kinda depends on getting a successful run at some point…

Yeah exactly – that can really do magic!
Other than that, it’s difficult for me to say anything else without the model.
Hope this helps anyway, and good luck!

ah yeah. I know that’s a standard for features is deep learning, but I hadn’t heard of using it here… it works well for MCMC too?


Definitely works like a charm – should be the default behavior!