Dirichlet Processes for GSoC 2019

My name is Yelysei, I’m a Master student in Computational Science and Engineering at TU Munich.
I would really like to take Dirichlet Processes to the next level in PyMC this summer.

Some relevant background: In my Bachelor thesis I did an overview of Bayesian Methods in ML, where I also used some models (PPCA, BPCA) from PyMC. I have >1 year experience implementing custom models in TensorFlow, such as energy-based models and Discrete Variational Autoencoders. In particular, I learned about DPs and HDPs from this paper and was amazed by how cool they are. My github.

I’ve already setup dev environment and merged a little PR. I carefully read the dev guide and more or less got the feel of the core infrastructure, and of course, saw the DP notebook.

Some thoughts about what can be done:

  1. perhaps the first thing is to encapsulate stick-breaking process, something like DirichletProcess from edward, which can also be useful for the upcoming pymc4
  2. then develop other sampling algorithms, that can have a dynamically growing number of mixture components, as mentioned in the DP notebook (Gibbs sampling, Stochastic Memoization)
  3. add Hierarchical DPs, as also mentioned in issue #1748
  4. implement online/mini-batch HDP, that can be useful for large corpora of data (LDA notebook might be relevant)
  5. add (specialized) variational inference for DP mixtures

What do you think? How is best to proceed? Any comments are appreciated.

1 Like

Hi @yell

Thanks for contacting us. I will recommend that you follow this guide. Specially the part of submitting a PR as this is hard requisite from Google. Overall the topics you are proposing seems right, as they will make DP in PyMC3 easier to use and also much more efficient. @AustinRochford, @ferrine what do you think?

BTW, @yell, you may want to check this and this to help you writea proposal. Notice that we could help during the process, you do not have to write it all by yourself.

1 Like

I’d be extremely grateful to receive feedback on my proposal draft

Many thanks for your time and consideration.