# PyTorch backend for PyMC4

#22

Read quite a bit recently about automatic differentiation

“Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?” : https://justindomke.wordpress.com/2009/02/17/automatic-differentiation-the-most-criminally-underused-tool-in-the-potential-machine-learning-toolbox/

1. You write a subroutine to compute a function f({\bf x}). (e.g. in C++ or Fortran). You know f to be differentiable, but don’t feel like writing a subroutine to compute \nabla f.
2. You point some autodiff software at your subroutine. It produces a subroutine to compute the gradient.
3. That new subroutine has the same complexity as the original function!
It does not depend on the dimensionality of \bf x.
4. It also does not suffer from round-off errors!

“Automatic Differentiation Variational Inference” : https://arxiv.org/abs/1603.00788

we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.

Have more if you like,

#23

Another tangent about probabilistic programming with functional programming techniques.

Posted some papers on the Figaro
https://github.com/p2t2/figaro/issues/347#issuecomment-336054456

The rationale is to show how function composition can be used to create composable MCMC algorithms.

(video bookmark) https://youtu.be/erGWMzzSUCg?list=PLnqUlCo055hX6SsmMr1AmW6quMjvdMPvK&t=1626
key insights about composing handers: sequential monte carlo (SMC) handler + MH handler => particle MCMC handler

Again, the intent is to suggest a language/DSL first, framework last approach to make the most out of this “crisis” caused by Theano going away.

#24

And this today: Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language

#25

I was just about to share that here!

#26

I’ve started running through the pyro docs examples, and oh boy, it looks powerful but the interface is seriously non-intuitive!

Then the thought came to mind: what if PyMC4 was a wrap-around pyro? Like Keras is for Theano/TF? Perhaps offloading the hard math part to the budding pyro community and defining the best interface for probabilistic programming? I’m quite under-informed on the kind of effort that’s needed for this, so this is just a thought, I guess…? I think I’ll get to bump into Colin tonight at Boston Bayesians, so I’ll try to get his thoughts…

#27

@ericmjl That’s a really interesting thought. We have considered the same with Edward / BayesFlow. Essentially both of those packages are aimed at researchers giving a lot of flexiblity at the cost of intuitive syntax. These can be viewed as a middle-layer on top of the graph engine. PyMC3 always shined at being beginner friendly with easy syntax, so can be seen as targeting the top level.

Not sure the existing syntax could work with pyro, however, as the model creation needs to be rerun I think.

#28

One benefit of dynamic graph would be on models with non-parametric priors such as CRP and IBP. I don’t see how these models can be sampled with static graphs.

#29

I like this idea, but for now Pyro doesn’t implement MCMC. To my knowledge, Pyro is for Bayesian deep learning so it only has SVI.

#30

#31

There are a few teams also in progress to implement something similar to tensorflow.contrib.distribution in PyTorch. You can find their discussion of the design here: https://goo.gl/9ccYsq.

#32

What are the current running candidates for the pymc4 backend ? Pyro and Pytorch ?

I personally have moved away from tensorflow to pytorch because of its intuitive api design. With theano going out of development, pytorch in my knowledge stands out as the best library for creating computational graphs and running automatic differentiation.

#33

@shkr: Thanks for your perspective. We haven’t really constrained the space too much. Options are: MXNet, TensorFlow, Edward, PyTorch, and Pyro. I’m listing packages as potential backends too as we could use those to build pymc3 API on top of.

Is this something you’d be interested in exploring, perhaps as part of GSoC (in case you’re a student)?

#34

I am not a student, so can’t participate in GSoc. But I do find time for open source work, so I can contribute via PRs/Issues as needed.

#35

My two cents as one of the users:

I want a working inference button. A natural continuation of thought that has lead to NUTS which is the workhorse of PyMC3 is Riemannian HMC and I’m itching to try it out. Despite many papers coming out, as far as I know no user-friendly packages exist for that yet, only “research-quality” code. I believe that whoever is first to provide a friendly package for RHMC is going to win quite some user base.

From my brief exposure it seems that people at STAN have been working on thist at least since 2014 but haven’t rolled it out yet, likely due to the fact that they have to write their own C++ for higher derivatives.

If we don’t want to block ourselves from being able to implement RHMC in the future I think we should pick a backend where higher order differentiation is a first class citizen. From brief googling none of the listed packages can fully claim this (although PyTorch seems to be moving in that direction). Autograd, claiming just that, was mentioned elsewhere, but was said to be slow. Are there some benchmarks?

From the above story with STAN I conclude that the more exotic syntax the backend will have the slower the development will be, both due to individual developer speed and inability of less advanced user to contribute (although the latter might be desired:)).

#36

We had a GSoC project to implement RHMC last year but it turned out to be much more difficult than anticipated. Numerical stability becomes a huge issue. I think the reason that STAN hasn’t rolled it out is that it doesn’t work all that well in practice (but this is just my impression, I could be wrong). As such, my interest in RHMC is waning and I think other methods like L2HMC are interesting directions.

#37