Issue on 2-D marginalized GMM

Hi!

I am trying to implement 2-D marginalized GMM as follows,
p(x_n | \pi, \mu, \sigma) = \sum_{i=1}^K \pi_k \text{Normal}(x_n | \mu_k, \tau_k)

p(\pi) = \text{Dirichlet} (\pi | \alpha \mathbf{1}_k)

p(\mu_k) = \text{Normal}(\mu_k | 0,I)

p(\tau_k ) = \text{Gamma} (\tau_k | a, b)

My data has shape
X.shape
(500,2)

According to an example in pymc3.tests/text_mixture.py

def test_normal_mixture_nd(self):
        nd, ncomp = 3, 5

        with Model() as model0:
            mus = Normal('mus', shape=(nd, ncomp))
            taus = Gamma('taus', alpha=1, beta=1, shape=(nd, ncomp))
            ws = Dirichlet('ws', np.ones(ncomp))
            mixture0 = NormalMixture('m', w=ws, mu=mus, tau=taus, shape=nd)

        with Model() as model1:
            mus = Normal('mus', shape=(nd, ncomp))
            taus = Gamma('taus', alpha=1, beta=1, shape=(nd, ncomp))
            ws = Dirichlet('ws', np.ones(ncomp))
            comp_dist = [Normal.dist(mu=mus[:, i], tau=taus[:, i])
                         for i in range(ncomp)]
            mixture1 = Mixture('m', w=ws, comp_dists=comp_dist, shape=nd)

        testpoint = model0.test_point
        testpoint['mus'] = np.random.randn(nd, ncomp)
        assert_allclose(model0.logp(testpoint), model1.logp(testpoint))
assert_allclose(mixture0.logp(testpoint), mixture1.logp(testpoint))

I create the model as follows:

# set up model
nd = 2
ncomp = 3
with pm.Model() as model0:
    mus = pm.Normal('mus', shape=(nd, ncomp))
    taus = pm.Gamma('taus', alpha=1, beta=1, shape=(nd, ncomp))
    ws = pm.Dirichlet('ws', np.ones(ncomp))
    mixture0 = pm.NormalMixture('m', w=ws, mu=mus, tau=taus, observed=X)

But it throws out error saying

Input dimension mis-match. (input[0].shape[0] = 500, input[1].shape[0] = 2)

Could you guys provide some help for this problem? Many thanks!

An ugly workaround:

# set up model
nd = 2
ncomp = 3
with pm.Model() as model0:
    mus = pm.Normal('mus', shape=(nd, ncomp))
    taus = pm.Gamma('taus', alpha=1, beta=1, shape=(nd, ncomp))
    ws = pm.Dirichlet('ws', np.ones(ncomp))
    mixture0 = pm.NormalMixture('m', 
                                w=ws, 
                                mu=mus, 
                                tau=taus, 
                                observed=np.random.randn(500, 2, 1))

As you need to evaluate the observed on each component of the comp_dist. Currently we automatically add a dimension on the right for 1D observed, but for multivariate observed it is not working yet…

Thanks for the reply. I used your code above but the inference performance is not good. Hence, I turned to pm.Mixture.

I tried to create diagonal matrix using the vector tau. But pymc.math seems to not have a create-diagonal-matrix method. Hence, I used theano’s method but it seems to not be compatible with pymc’s input type.

The code is as follows and it throws the error ('AllocDiag only works on vectors', [tau0]).

# set up model
K = 3
with pm.Model() as model0:
    pi = pm.Dirichlet('pi', np.ones(K))

    comp_dist = []
    mu = []
    tau = []
    precision = []
    for i in range(K):
        mu.append(pm.Normal('mu%i'%i, 0, 10, shape=2))
        tau.append(pm.Gamma('tau%i'%i,1, 1, shape=2))
        
        precision.append(tt.nlinalg.AllocDiag()(tau))
        comp_dist.append(pm.MvNormal.dist(mu=mu[i], tau=precision[i]))

    xobs = pm.Mixture('x_obs', pi, comp_dist,
            observed=X)

Do you have any suggestions? Thanks!

Using Mixture and Normalmixture is equivalent in this case which means you will also have inference problem. What kind of inference problem you are having?

I am using Gaussian Mixture Model to perform image segmentation.

More specifically, the data is an image which is transformed to a pixel matrix of shape (width * height, 3), where 3 represents 3 channels. Hence N= width * height and D=3.

The problem is that, first ADVI seems to have “mode collapse” problem, in that in only capture a few clusters.
secondly, the inferred means of clusters (which are a list of (R,G,B) tuple, does not seem to be in accordance with the original plot.

You could find the segmented figure and the original figure below.

2092

But MCMC gives me a rather good result ( I used Gibbs Sampling from Edward)

However, the plot of ADVI.hist shows that EBLO havs converged. I made a few adjustments to figure out why ADVI fails to give a good segmented figure:

  1. The optimization is stuck at a local optimum. However, I tried with different initialization, and increase the number of clusters, it is not helping much.

  2. The model assumption is not good enough. But since MCMC is working, this actually would not be a problem.

  3. I also tried FullRankADVI, and LKJ prior on the covariance matrix. Not helping much as well.

Could you please provide some insights into this task? I really appreciate your help!

Why not using sampling in PyMC3 if sampling is working fine?
It is hard to say without the data and more information, but in general, I will try to compare the estimation from mcmc and advi. For example, doing a scatter plot of all the estimated parameters (x-axis being mcmc, y-axis being advi)

I tried to use NUTS in pymc but it takes forever.

Thanks for your suggestions for the comparison and I will try! In fact, I used PSIS to diagnose ADVI, but evaluating the model likelihood p(\theta, X) also takes a lot of time.

Do you have any idea in speeding up NUTS, and PSIS (evaluating model likelihood)?

You can try profiling the model to check for slowness. If both NUTS and PSIS is slow, it is a strong indication that the logp implementation should be improved.

For NUTS, in general more information prior helps. Otherwise, the reason is case by case and you need to identify specifically to your model.

Also, mixture model is hard to do inference on in general (there are plenty of discussion on the discourse).

Thank you so much! I will try your suggestions.

There is one last question that I’ve been thinking for a while? How could I create a diagonal matrix from a vector in pymc3? That is, I have

tau = pm.Gamma('tau%i'%i,1, 1, shape=2)

And I want to do

precision = diagonal_matrix(tau) # pseudo-code

I found that pymc3.math only supports extracting-diagonal-element operation.

This works for me:

tau = pm.Gamma('tau', 1, 1, shape=2)
c = tt.nlinalg.alloc_diag(tau)

thanks!