Hello!

As per my last post, I was warned that sampling mixture models is dangerous so I wanted some advice on perhaps some tweaks to the below model. It is an unsupervised clustering model .

```
with pm.Model() as Model_:
#Create a covariance matrix for each potential cluster which relates all features of our data
Lower = tt.stack([pm.LKJCholeskyCov('Sigma_{}'.format(k), n=NumberOfFeatures_, eta=2.,
sd_dist=pm.HalfCauchy.dist(2.5)) for k in range(NumberOfClusters_)])
Chol = tt.stack([pm.expand_packed_triangular(NumberOfFeatures_, Lower[k]) for k in range(NumberOfClusters_)])
#The center of each cluster
Mus = tt.stack([pm.Normal('Mu_{}'.format(k), 0.1, 10., shape=NumberOfFeatures_) for k in range(NumberOfClusters_)])
#W_Min_Potential = pm.Potential('L_potential', tt.switch(tt.min(Weights) < .1, -np.inf, 0))
#Create the multivariate normal distribution for each cluster
MultivariateNormals = [pm.MvNormal.dist(Mus[k], chol=Chol[k], shape=X.shape) for k in range(NumberOfClusters_)]
#Create the weights for each cluster which measures how much impact they have
Weights = pm.Dirichlet('w', np.ones(NumberOfClusters_)/NumberOfClusters_)
#Due to software bugs, we create a log likelihood by hand
logpcomp = tt.stack([Dist.logp(theano.shared(X_)) for Dist in MultivariateNormals], axis=1)
Prob = pm.Deterministic("Probabilities", logpcomp)
pm.Potential('logp', pm.logsumexp(tt.log(Weights) + logpcomp, axis=-1).sum())
```

I’m finding that the effective samples is quite low and the gelman-rubin statistic is high. Any help would be greatly appreciated, thank you!