Dirichlet Processes for GSoC 2019

yell · March 28, 2019, 8:42pm

Hi!
My name is Yelysei, I’m a Master student in Computational Science and Engineering at TU Munich.
I would really like to take Dirichlet Processes to the next level in PyMC this summer.

Some relevant background: In my Bachelor thesis I did an overview of Bayesian Methods in ML, where I also used some models (PPCA, BPCA) from PyMC. I have >1 year experience implementing custom models in TensorFlow, such as energy-based models and Discrete Variational Autoencoders. In particular, I learned about DPs and HDPs from this paper and was amazed by how cool they are. My github.

I’ve already setup dev environment and merged a little PR. I carefully read the dev guide and more or less got the feel of the core infrastructure, and of course, saw the DP notebook.

Some thoughts about what can be done:

perhaps the first thing is to encapsulate stick-breaking process, something like DirichletProcess from edward, which can also be useful for the upcoming pymc4
then develop other sampling algorithms, that can have a dynamically growing number of mixture components, as mentioned in the DP notebook (Gibbs sampling, Stochastic Memoization)
add Hierarchical DPs, as also mentioned in issue #1748
implement online/mini-batch HDP, that can be useful for large corpora of data (LDA notebook might be relevant)
add (specialized) variational inference for DP mixtures

What do you think? How is best to proceed? Any comments are appreciated.

aloctavodia · March 30, 2019, 1:44pm

Hi @yell

Thanks for contacting us. I will recommend that you follow this guide. Specially the part of submitting a PR as this is hard requisite from Google. Overall the topics you are proposing seems right, as they will make DP in PyMC3 easier to use and also much more efficient. @AustinRochford, @ferrine what do you think?

BTW, @yell, you may want to check this and this to help you writea proposal. Notice that we could help during the process, you do not have to write it all by yourself.

yell · April 3, 2019, 11:48pm

I’d be extremely grateful to receive feedback on my proposal draft

Many thanks for your time and consideration.

Topic		Replies	Views
GSoC 2022: Continuation of Dirichlet Process + Mixture Support Development	0	441	March 26, 2022
GSoC 2021 Project Development gsoc2021 , gsoc	1	600	March 23, 2021
Dirichlet Process GSOC 2021 Development gsoc2021 , gsoc	2	539	March 31, 2021
Hierarchical Dirichlet process in pymc Questions	3	981	February 13, 2019
Is my model setup in proper way? Dependent Dirichlet process (DDP) v5 development , modeling , sampling	1	43	April 21, 2025

Dirichlet Processes for GSoC 2019

Related topics