Understanding error while sampling a model with conditional probabilities


I am trying to model a dataset where each object has a variable number of features.
In particular, I am trying to build on a finite approximated beta process-bernoulli process type of model, where each object has a variable-length list of values drawn from a normal distribution, and where the rest of the list is filled with some dummy values.

My example model is as follows:

import pymc3 as pm
import numpy as np
import scipy.stats.distributions as dist
import theano.tensor as tt
D = 100
A = 4
K = 15

poissons = dist.poisson.rvs(A, size=D)
normals = []
for p in poissons: 
            np.concatenate([dist.norm.rvs(0, 1, size=p), -999 * np.ones(K - p)]))
normals = np.asarray(normals)

with pm.Model() as model:
    pis = pm.Beta('pis', A/K, 1, shape=K)
    bs = pm.Bernoulli('bs', pis, shape=K)
    ns = pm.Normal('ns', 0, 1, shape=K)
    errs = pm.Normal('errs', -999, 1e-10, shape=K)
    vs = pm.Normal('vs', tt.switch(tt.eq(bs, 0), ns, errs), 1, observed=normals)

Note that it tries to model the contained values with two different distributions: one for normal values ‘ns’ and one for dummy values ‘errs’. It seems to work fine if I simply have the switch inside the mean of a single Normal, but I would like in the future to have different types of distributions for ‘ns’ and ‘errs’.

Currently, when sampling:

with model:
    trace = pm.sample(10)

I get the following error:

ValueError: Mass matrix contains zeros on the diagonal. Some derivatives might always be zero

Are there any suggestions on how to possibly avoid such error, or how to change the model so it works for the particular use case?

Thank you in advance!

Please note, that I get the same error if I replace the tt.switch(tt.eq(bs, 0), ns, errs) part with bs * ns + (1 - bs) * errs, the latter which I would have expected to be a totally standard operation.

As mentioned in a previous thread this may be due to an overflow somewhere. Try to go down the hierarchy with some dummy values to see the scale of the values of the variables as you go deeper.

Thanks for the response.

Do you have an example of how I use dummy values here? I tried to read the other thread, but it was not obvious to me.

If you update to the current master branch, it will give you a more informative error message showing you which RV is hitting the zero grad error.

Looking at the code, it is likely that the RV errs is too little uncertainty - as @rlouf suggest, you can try setting it to a dummy variable: errs = np.ones(K)*(-999)

1 Like

Great, thanks!

I was trying to experiment a bit myself, and it does seem that NUTS is more sensitive to low standard deviation than other MCMC methods.

Sure! What I just did is:

pis = dist.beta.rvs(4/15, 1, size=15)
bs = dist.bernoulli.rvs(pis)
ns = dist.norm.rvs(0,1, size=15)
errs = dist.norm.rvs(-999, 1e-10, size=15)
vs = []
for b, n, e in zip(bs, ns, errs):
    if b == 0:
        mu = n
        mu = e
    vs.append(dist.norm.rvs(mu, 1))

With the variables you provided, your dummy data will generate a lot of -999. On the other hand, the Beta distribution generates a lot of very small values. These are somewhat not compatible. I suspect the model is too tightly constained as it is. Tryu puting an hyperprior on the first parameter of the Beta.


Awesome, thanks for the example and suggestions.