Model diagnostics for "Mass matrix contains zeros on the diagonal"

The model below is failing for me in the middle of a sample(), with the ValueError “Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.”

Is there a way to see those derivatives and discover which one is problematic?

What I have here is a gamma distribution for y, as a function of x. I can see that the mean has a dependence on x, and it appears that I can put all of this dependence into beta, keeping alpha independent of x. imu is the inverse mean for which I claim to have a functional form. I was able to run the model by holding beta constant (and putting the x-dependence into alpha), but the data does not support that empirically, so I don’t trust that fit.

model = pm.Model()
with model:
    lnmu = pm.HalfNormal('lnmu_0', sd=1)
    lam = pm.HalfNormal('lambda', sd=10)
    imu = lnmu*tt.exp(lam*tt.log(100 + x))
    power = pm.HalfNormal('alpha', sd=1)
    decay = pm.Deterministic('beta', power * imu)
    obs = pm.Gamma('obs', alpha=power, beta=decay, observed=y)
1 Like

This is quite a difficult problem to debug usually - @aseyboldt do you know any good way to solve this?

My guess is that this is due to overflows. lam can easily be something like 20, so imu can be something on the order of 1e20. I don’t think you want a gamma distribution with beta=1e20. What are you trying to model? It might be better or at least easier to build a model around log(y) instead.

Thanks, that makes sense. I do want beta to grow as a function of x, but I will add a cutoff to keep it reasonable.

I am also facing the same issue with my code, I am very new to this, Can you please help?
I tried with log(y) as well but it didn’t resolve the issue.

with pm.Model() as model:
    p = pm.Gamma('p', alpha=1, beta=3, shape=regions.shape[0])
    q = pm.Gamma('q', alpha=1, beta=3, shape=regions.shape[0])
    m = pm.Lognormal('m', mu=np.log(total_M), sd=.25, shape=regions.shape[0])
    t = pm.Uniform('t', lower=0, upper=100, observed=sales.t)
    cid = pm.Categorical('cid', p=np.repeat(1./sales.shape[0], sales.shape[0]), observed=sales.region )
    sigma = pm.Gamma('sigma', alpha=1, beta=3)
    mu = m[cid]*(((p[cid]+q[cid])**2)/p[cid])*((np.exp(-(p[cid]+q[cid])*t))/((1+(q[cid]/p[cid])*np.exp(-(p[cid]+q[cid])*t))**2))
    Y_obs = pm.Normal('Ft', mu= mu, sd=sigma, observed= sales.sales)
    trace = pm.sample(100000,init = 'adapt_diag', progressbar = True, tune  = 1000) 

It gives the same error:
ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero.

As aseyboldt pointed out, this is because one of overflow. I’m not sure about the scale of mu. Also, as a side note, I’d decouple mu and sigma and have something like:

nu = pm.Normal("nu", 0, 1)
Y_obs = pm.Deterministic('Ft', nu * sigma + mu, observed = sales.sales)

You should have fewer divergences. See the Gelman Schools example in the pymc3 docs.

You cannot assign observed to Deterministic nodes. However, what you said about scale of prior (and shape of prior for that matter) is valid. I would suggest you to plug in some number to p and q and see what is the scale of mu using your current prior.

In addition, since the two node is fully observed with a uniform prior:

    t = pm.Uniform('t', lower=0, upper=100, observed=sales.t)
    cid = pm.Categorical('cid', p=np.repeat(1./sales.shape[0], sales.shape[0]), observed=sales.region )

They are no different than doing just

    t = sales.t
    cid = sales.region

Quick follow-up question: can this error also be caused by underflow?

This is happening to me in one case where it looks like one of the observations will be extremely unlikely: I have one subset of the data that has an enormous degree of variation, where nothing else does. Could that cause this symptom?


@rpgoldman— Might not be helpful to your case, but I can tell you this happened to me when a bounded distribution parameter (in my case sd) was dropping outside its bound.

I don’t obviously see this happening, but it is certainly possible. In another context, I believe someone reported that use of jitter in the choice of initial point could cause that point to leave the support, and that their problem went away when they changed from something like jitter+adapt to just adapt (I forget the precise details).

1 Like

I’m also getting the same issue. I have a simple linear regression:

with pm.Model() as model:
    theta = pm.Normal('theta', mu=0, sd=10, shape=(predictors.shape[1], 1))
    summation =, theta)
    sigma = pm.HalfCauchy('sigma', beta=1)
    traces = pm.sample(tune=1000)

There are 21 predictors and the min is 0 and max is 243. I had smaller sd for the theta variables, but it didn’t do anything. I also had pm.Exponential for sigma earlier but that didn’t do anything. I also tried changing the init keyword to adapt_diag. That passed once, but rerunning it again, I get the same failure. PyMC3 version 3.6.


Nevermind! I figured it out. It turned out that the sigma should be much bigger than what I had started with. For example, setting sigma to pm.Exponential('sigma', lam=0.01) did the trick. Thanks anyways!

1 Like

I’m also getting the same error:

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV beta.ravel()[1] is zero.

In my case, I’m building a Weibull AFT with covariates model for survival analysis.

My model is (I have tried to build it following the online tutorial):

def weibull_lccdf(x, alpha, beta):
''' Log complementary cdf of Weibull distribution. '''
return -(x / beta)**alpha

event = data_weibull['cens'].values.astype(bool)

with pm.Model() as model:

     # Priors for unknown model parameters
    rho = pm.Gamma('rho', alpha=0.001, beta=0.001, testval=2.0)
    beta = pm.Normal('beta', mu=0, sd=0.001, shape=2, testval=0.0)

    # Expected value of scale parameter
    lambda_obs = pm.Deterministic('lambda_obs', tt.exp(beta[0]+beta[1]*data_weibull['temperature'] 
    lambda_cens = pm.Deterministic('lambda_cens', tt.exp(beta[0]+beta[1]*data_weibull['temperature'] 

    # Likelihood (sampling distribution) of observations
    runningtime_obs = pm.Weibull('runningtime_obs', alpha=rho, beta=lambda_obs, 
    runningtime_cens = pm.Potential('runningtime_cens', weibull_lccdf(data_weibull['runningtime'] 
    [~event].values, rho, lambda_cens))


with model:
    # draw ni posterior samples
    step = pm.NUTS(vars=[shape, beta])
    trace = pm.sample(draws=ni, tune=nb, chains=nc, step=step, cores=1)

I have tried changing the init parameter of pm.sample and also using the Hamiltonian Monte Carlo sampler but the same issue appears. It only ‘works’ with the Metropolis sampler but all the parameters are 0 after the sampling. Any clue about this problem @junpenglao ? Thanks a lot in advance!

Are you normalizing your features? I was having the same problem, but after I used Stardardscale, and normalized it worked.

From what I researched the problem can be overflow. Overflow errors were found to occur in models which require logs or exponentials. So if the data requires that, the library is not good to face this issue