Unsupervised clustering: Mass matrix contains zeros on the diagonal

adam · May 15, 2018, 7:43pm

I’m working on the Iris dataset, trying to use pymc3 to divide the petal_width (numpy array) for the versicolor & virginica flowers into 2 separate clusters.
The distribution of the petal_width:

My model for the unsupervised clustering:

data = two_flowers['petal width']
with pm.Model() as model:
    p1 = pm.Uniform('p', 0, 1)    
    p2 = 1 - p1                   
    p = T.stack([p1, p2])         

    assignment = pm.Categorical("assignment", p, 
                                shape=data.shape[0],
                                testval=np.random.randint(0, 2, size=data.shape[0])) 

with model:
    sds = pm.Uniform("sds", 0, 1, shape=2)
    centers = pm.Normal("centers",              
                        mu=np.array([1.3, 2.0]), 
                        sd=np.array([1, 1]), 
                        shape=2)
    
    center_i = pm.Deterministic('center_i', centers[assignment])
    sd_i = pm.Deterministic('sd_i', sds[assignment])            
    
    observations = pm.Normal("obs", mu=center_i, sd=sd_i, observed=data) 

    trace = pm.sample(10000, tune=9000, njobs=1)

But I get the following error:

Mass matrix contains zeros on the diagonal. Some derivatives might always be zero

I tried changing the sds hyperparameters so they are not too close to zero (in case that is the error):
sds = pm.Uniform("sds", 1, 2, shape=2)

But I either get a very high autocorrelation, or inaccurate posteriors because the true sds is under 1.0:

The only way I got this model to converge (and it converged very well) is if I multiplied the petal_width * 100 before modeling, and changed the parameters accordingly:

sds = pm.Uniform("sds", 0, 100, shape=2)
centers = pm.Normal("centers",              
                    mu=np.array([130, 200]), 
                    sd=np.array([10, 10]), 
                    shape=2)

Why does the model work with larger values but give the error with small values?

narendramukherjee · May 16, 2018, 1:47am

I think the reason it works well with big values is exactly what you surmised - the fact that the sd is now much bigger than 1 (and far from small values). Overall, I do not think this is a good way to specify a Gaussian Mixture Model anyways. I would advise:

Specifying the prior on mixing proportions as a Beta distribution in this case (or a Dirichlet generally).
Using the NormalMixture distribution in pymc3 as NUTS becomes hard to use with the Categorical assignments.

See the following notebook for a nice description of this representation of Gaussian Mixture Models:
http://docs.pymc.io/notebooks/marginalized_gaussian_mixture_model.html

adam · May 16, 2018, 2:58am

Thank you narendramukherjee. I feel ill be thanking you a lot in the future. You need a shorter name handle

junpenglao · May 16, 2018, 2:38pm

FYI, since this problem comes up quite a lot and it is a bit difficult to debug, I just send in a PR to make the error output more informative: https://github.com/pymc-devs/pymc3/pull/2977

It will output something like:

which hopefully makes identifying the problem of your model a bit easier.

narendramukherjee · May 16, 2018, 2:49pm

That’s super helpful @junpenglao, thanks for that

adam · May 16, 2018, 5:55pm

Thank you, @junpenglao

Topic		Replies	Views
Help: Issues running a model for bayesian media mix Questions development	1	585	November 29, 2021
Properly sampling mixture models Questions	9	4081	March 26, 2018
Mixture of unigrams model - slow when large number of observations Questions	1	371	January 14, 2021
Hierarchical mixture model with point-mass at zero Questions modeling	2	522	October 21, 2022
Mixture model with sample specific weights Questions bug	1	600	April 12, 2023

Unsupervised clustering: Mass matrix contains zeros on the diagonal

Related Topics