GSoC 2022: Continuation of Dirichlet Process + Mixture Support

Hi all,

Following the recent months of working with PyMC, I would like to apply to GSoC 2022. Last summer, I started a long-term project on adding a Dirichlet Process (DP) functionality to PyMC. DPs fall under a broad class of Bayesian nonparametric methods which include Gaussian processes and Bayesian Additive Regression Trees. DP are nonparametric in that users can circumvent specifying distributional assumptions as random draws from a DP prior are distribution themselves. My progress can be summarized in the addition of a truncated stick-breaking distribution to the codebase (see PR 5200), also known as the GEM distribution in BNP literature.

DPs are most often used under the form of a DP Mixture and a prospective submodule for DPs would benefit from good support for (finite) mixture distributions. A recent refactoring process led by @ricardoV94 (see PR 5438) reintroduces mixture functionality to PyMC v4, but more improvements can be made from the Aeppl side. For GSoC 2022, I hope to continue my work on building an API for DPs while also building on current efforts in generalizing aeppl’s capacity in manipulating mixture stacks and more broadly having multivariate distributions as components. More generally, this project would also be a good opportunity to further familiarize myself with aesara and aeppl which, admittedly, have been a challenge for me to grasp.

If anyone has any comments or suggestions, I’m always happy to hear them!


CC’ing @AustinRochford @fonnesbeck @ricardoV94 @brandonwillard