Why does MAP vastly outperform sample in bayesian clustering?

A few code snippets to illustrate the point:

import pymc as pm

# Create two (seemingly) equivalent mixtures
cov = [[1, 0], [0, 1]]
w = [0.5, 0.5]
diag_mvn_mix = pm.Mixture.dist(w, comp_dists=[pm.MvNormal.dist([-10, -10], cov), pm.MvNormal.dist([10, 10], cov)])
ind_norm_mix = pm.Mixture.dist(w, comp_dists=[pm.Normal.dist([-10, -10]), pm.Normal.dist([10, 10])])

Each mixture has two components, a very negative and very positive. Within in each component, there are two elements, both with the same mean.

Our naive expectation is the result mixtures should only draw two positive numbers, or two negative numbers. The next code tests the theory:

# Draw samples
diag_mvn_samples, ind_norm_samples = pm.draw([diag_mvn_mix, ind_norm_mix], 100_000)

# Percentage of draws from MvNormal that agree in sign 
(np.all((diag_mvn_samples > 0), axis=1) | np.all((diag_mvn_samples < 0), axis=1)).sum() / 1000
>>> Out: 100.0

# Percentage of draws from independent normals that agree in sign:
(np.all((ind_norm_samples > 0), axis=1) | np.all((ind_norm_samples < 0), axis=1)).sum() / 1000
>>>Out: 49.9

What is happening? The independent normals are being mixed in all dimensions, because the PyMC model doesn’t know how to distinguish support dimension from a core dimension. I was able to grok it by thinking about the generative process. In the MvN case, we flip a coin to choose a distribution then sample from that distribution. In the independent Normal case, we traverse the batch dimensions (all dimensions to the left of the component dimension) and flip a coin for each one. That’s how we end up with mixed signs. Note that this doesn’t happen in the MvN case only because PyMC knows that it’s a multivariate distribution. If we had an additional batch dimension – for example we wanted to sample a 3-tuple of (x, y) coordinates – you would see the same “multi-flipping” behavior in the MvN case.

This also has consequences for logp evaluations, as it means that the non-sorted coordinate (y in your case) will not be able to become “attached” to a single x, and you will observe label switching. Here is an example of my results using independent normals:

image

You can see that the x coordinates are all correct, but where distributions are aligned vertically, there is mode switching in the y dimension. You must use an MvN to prevent this from happening.

1 Like