Why does MAP vastly outperform sample in bayesian clustering?

A (PyMC) mixture of MvNormal vs univariate Normals is not equivalent even with unit covariance. Mixture treats the x,y pairs as exchangeable across univariate components (i.e., one x could come from one component and the y from another). This is not the case with mvnormal components where both x and y must come from the same component.

In your model this means that when solving the x labeling problem with the ordering constraint the y is also now well identified. If you use univariate normals, the y variate is still subject to label switching because it is independent from x.

1 Like