"ValueError: Mass matrix contains zeros on the diagonal" error, how to debug?

hamburgled · February 10, 2020, 7:30pm

Hi, I’m relatively new to pymc3 but hope someone can help. I have a very large model. When I run it for most of my datasets, it works fine and samples without problems.

However, with a few of my datasets, it works intermittently. It sometimes finishes successfully like the others, but often crashes with the “ValueError: Mass matrix contains zeros on the diagonal” error. It will also say:

The derivative of RV `myvar_log__`.ravel()[0] is zero.

for a bunch of the variables.

I’ve read a few threads here about this error, but I haven’t yet seen an explanation or a way of figuring out the cause of it. I know cause of the problem can probably be specific to my model, so I’m wondering about a general way of solving this: what does the error actually mean, and where can I look to debug it?

any tips are appreciated, thank you.

junpenglao · February 10, 2020, 8:04pm

You can find some pointer here: Frequently Asked Questions

hamburgled · March 4, 2020, 5:47pm

Hi @junpenglao ,

thanks for the response. I read that thread. It appears to be giving a way to identify the RV causing the trouble, but I already know which it is (it’s in the error above).

Also, it seems like the error he had was a “bad initial energy: nan”, whereas I have the one with “derivative of RV is zero”. Are those actually the same ones?

I also have tried replacing ‘jitter+adapt_diag’ with just ‘adapt_diag’, but no change.

What’s strange is that I run the same model with several data sets, and they all work except for this one.

What steps can I try to take to diagnose the problem?

thank you!

AlexAndorra · March 7, 2020, 4:19pm

Hi,
Yeah I think usually these errors relate to the same problems – NaN values or zeros somewhere, making the log of these values unidentifed (if I remember correctly).

It’s hard to say without looking at the model, but have standardized your predictors / outcome variables? If they are on large scales, it can mess up sampling in exactly that way.

hamburgled · March 7, 2020, 4:38pm

Hey,

yeah, unfortunately the model is very large at this point, fitting to a bunch of data, so it’s hard to provide a MWE

However, one clue is that it works for all the other data sets, but for the one that it fails with, that dataset doesn’t have one of the observables (i.e., a “column” of the data) that the other ones all do. So, it’s underconstrained. My current theory is that because it’s underconstrained, the “energy landscape” might be really flat there, leading to 0 derivative?

What do you mean by standardized, do you mean normalizing them so that they have mean ~ 0 and SD ~ 1 or something?

So I did kinda get it to work, though it’s definitely hack-y. I used the start arg of pm.sample(), and use the last trace point of a previous successful run (with the same dataset), and it works. But that kinda depends on getting a successful run at some point…

AlexAndorra · March 7, 2020, 4:54pm

Yeah exactly – that can really do magic!
Other than that, it’s difficult for me to say anything else without the model.
Hope this helps anyway, and good luck!

hamburgled · March 7, 2020, 5:56pm

ah yeah. I know that’s a standard for features is deep learning, but I hadn’t heard of using it here… it works well for MCMC too?

thanks!

AlexAndorra · March 7, 2020, 6:37pm

Definitely works like a charm – should be the default behavior!

Topic		Replies	Views
ValueError: Mass matrix contains zeros on the diagonal. The derivative of RV `@@@`.ravel()[0] is zero v3 modeling	0	585	September 13, 2022
ValueError: Mass matrix contains zeros on the diagonal Questions	1	2214	March 16, 2020
Getting Mass matrix contains zeros on the diagonal Questions	2	601	May 2, 2021
Mass matrix contains zeros on the diagonal Questions	6	4304	May 4, 2020
Chain 1 failed due to Mass matrix contains zeros on the diagonal Questions sampling	3	847	February 1, 2022

"ValueError: Mass matrix contains zeros on the diagonal" error, how to debug?

Related topics