2D Gaussian Mixture

Hello, new to PyMC. Is it possible to fit a two-dimensional Gaussian mixture model?

If you mean a multivariate gaussian of length 2, yes it’s possible to fit a mixture of them. Check out pymc.Mixture — PyMC 5.9.1 documentation

Thanks for the reply @ricardoV94. The pymc.Mixture example you linked appears to be multidimensional mixture in the sense that each component of the vector is a mixture of Gaussians. I’m looking to fit data that looks like this:
img
Where each 2-dimensional vector is a mixture of a multivariate Gaussian.

Mixture accepts arbitrary distributions as components, in your case you want to pass MvNormal dists as components. The last example in the docstrings has a mixture of Dirichlets which is also multivariate.

When you have multivariate components there is no mixing within the core dimensions.

It is possible and by the plot above I assume you want to use it for clustering? I have been looking into the same application too and there is a post about it here for instance:

Note several things:

1- I use univariate normals (because application I have in mind will later require that) but is easy to change to MvNormal

2- sklearn also has sklearn.mixture.GaussianMixture which is not MCMC based but rather than fit with EM (expectation maximization). It makes a nice baseline comparison for whatever you come up with. I find that coming up a with a model in pymc and using MAP with initial conditions supplied from say kmeans produces almost identical results. Doing it yourself has the added benefit that you can compute likelihood of a point belonging to each cluster and put meaningful thresholds on your prediction. You can also compute the likelihood of each of your cluster-center belong to other clusters and also use that as a basis to determine the number of clusters (you will still need to pick arbitrary thresholds to some extended but oh well). Of course you can also modify your likelihood to address more general cases.

3- MCMC based mixture fitting in higher dimensions seems like a quite tricky issue due to “label switching” and multi-modality, which can be remedied to some extended if your priors are more informed and or you use MAP. See:

@zach my response here might be relevant to this thread as well: