I’m working on the Iris dataset, trying to use pymc3 to divide the petal_width (numpy array) for the versicolor & virginica flowers into 2 separate clusters.
The distribution of the petal_width:
My model for the unsupervised clustering:
data = two_flowers['petal width'] with pm.Model() as model: p1 = pm.Uniform('p', 0, 1) p2 = 1 - p1 p = T.stack([p1, p2]) assignment = pm.Categorical("assignment", p, shape=data.shape, testval=np.random.randint(0, 2, size=data.shape)) with model: sds = pm.Uniform("sds", 0, 1, shape=2) centers = pm.Normal("centers", mu=np.array([1.3, 2.0]), sd=np.array([1, 1]), shape=2) center_i = pm.Deterministic('center_i', centers[assignment]) sd_i = pm.Deterministic('sd_i', sds[assignment]) observations = pm.Normal("obs", mu=center_i, sd=sd_i, observed=data) trace = pm.sample(10000, tune=9000, njobs=1)
But I get the following error:
Mass matrix contains zeros on the diagonal. Some derivatives might always be zero
I tried changing the sds hyperparameters so they are not too close to zero (in case that is the error):
sds = pm.Uniform("sds", 1, 2, shape=2)
But I either get a very high autocorrelation, or inaccurate posteriors because the true sds is under 1.0:
The only way I got this model to converge (and it converged very well) is if I multiplied the petal_width * 100 before modeling, and changed the parameters accordingly:
sds = pm.Uniform("sds", 0, 100, shape=2) centers = pm.Normal("centers", mu=np.array([130, 200]), sd=np.array([10, 10]), shape=2)
Why does the model work with larger values but give the error with small values?