Unsupervised clustering: Mass matrix contains zeros on the diagonal

I’m working on the Iris dataset, trying to use pymc3 to divide the petal_width (numpy array) for the versicolor & virginica flowers into 2 separate clusters.
The distribution of the petal_width:


My model for the unsupervised clustering:

data = two_flowers['petal width']
with pm.Model() as model:
    p1 = pm.Uniform('p', 0, 1)    
    p2 = 1 - p1                   
    p = T.stack([p1, p2])         

    assignment = pm.Categorical("assignment", p, 
                                testval=np.random.randint(0, 2, size=data.shape[0])) 

with model:
    sds = pm.Uniform("sds", 0, 1, shape=2)
    centers = pm.Normal("centers",              
                        mu=np.array([1.3, 2.0]), 
                        sd=np.array([1, 1]), 
    center_i = pm.Deterministic('center_i', centers[assignment])
    sd_i = pm.Deterministic('sd_i', sds[assignment])            
    observations = pm.Normal("obs", mu=center_i, sd=sd_i, observed=data) 

    trace = pm.sample(10000, tune=9000, njobs=1)

But I get the following error:

Mass matrix contains zeros on the diagonal. Some derivatives might always be zero

I tried changing the sds hyperparameters so they are not too close to zero (in case that is the error):
sds = pm.Uniform("sds", 1, 2, shape=2)

But I either get a very high autocorrelation, or inaccurate posteriors because the true sds is under 1.0:

The only way I got this model to converge (and it converged very well) is if I multiplied the petal_width * 100 before modeling, and changed the parameters accordingly:

sds = pm.Uniform("sds", 0, 100, shape=2)
centers = pm.Normal("centers",              
                    mu=np.array([130, 200]), 
                    sd=np.array([10, 10]), 

Why does the model work with larger values but give the error with small values?

I think the reason it works well with big values is exactly what you surmised - the fact that the sd is now much bigger than 1 (and far from small values). Overall, I do not think this is a good way to specify a Gaussian Mixture Model anyways. I would advise:

  1. Specifying the prior on mixing proportions as a Beta distribution in this case (or a Dirichlet generally).
  2. Using the NormalMixture distribution in pymc3 as NUTS becomes hard to use with the Categorical assignments.

See the following notebook for a nice description of this representation of Gaussian Mixture Models:

1 Like

Thank you narendramukherjee. I feel ill be thanking you a lot in the future. You need a shorter name handle :grin:

1 Like

FYI, since this problem comes up quite a lot and it is a bit difficult to debug, I just send in a PR to make the error output more informative: https://github.com/pymc-devs/pymc3/pull/2977

It will output something like:

which hopefully makes identifying the problem of your model a bit easier.


That’s super helpful @junpenglao, thanks for that :slight_smile:

1 Like

Thank you, @junpenglao