How Turing.jl compares to PyMC3?

Hello :slight_smile:

I’ve recently crossed my path with a Julia evangelist that directed my attention to the probabilistic programming library Turing.

After a quick skim through the documentation I’ve noticed that is very similar to PyMC3 in terms of syntax (with slightly less boilerplate code). I was therefore wondering if anyone had experience with it and could provide insights on the main differences with PyMC3.

In particular wrt the speed of sampling since it seems to be one of the major advantages that Julia has over Python.


My main problems when I try to use Turing.jl are the small community and Turing’s reliance on Distributions.jl, which lacks convenient parametrizations for lots of distributions. The main advantage for me of Turing was being able to use Julia, which is a language that was designed from the ground up as a statistical language, rather than an object-oriented language that later got data science packages bolted onto it. Turing also samples a little bit faster, but loading packages takes so long in Julia that it may not be worth it until the language is a bit older and the time-to-first-plot issue has been worked on some more.


I’ve played a bit with Turing.jl and thought it was a pretty neat language. What I liked about it was that it seemed to have native support for truncated distributions, via the truncated() function, and also that you can use loops to specify your model without fear of slowing down the model considerably.
In practice, it turned out that I stumbled on a bug trying to fit a Zero-Truncated Poisson. The community (on slack) was, however, quite amazing and within a day there was a PR to fix some of the problems I faced.
I wouldn’t write it off, even with the time-to-first-plot problem. I’m currently experimenting with PyMC3, Stan and Turing.jl so I can’t say I’m experienced in any of them.

1 Like

I haven’t used Julia much, and can’t speak about speed comparisons at all, I’m also not very interested in speed comparisons between libraries. But I want to note that at ArviZ we are also working on interopreability. It is already possible to run inference in Turing save as netcdf and ananlyze in python, or the other way around if someone wanted to do that.

And that is not all, thanks to the already amazing ArviZ-PyMC3 integration, you can run inference with Turing, Stan or whatever and then use PyMC3 to sample from the posterior predictive using posterior samples from an arbitrary inferencedata, you only need variable names to match between inferencedata and pymc3 model.

I also want to note the ability to use named dimensions to specify pymc3 models (which will be even more flexible in v4) which is something no other PPL allows as far as I know. For me it’s a huge pro as it helps in me writing the model for the first time, me explaining my model to other people and me understanding other people models or even my own model after some months/years.


Thank you everyone for the insightful answers.

I wanted to evaluate myself potential differences in the speed of sampling and I found Turing.jl being noticeably faster than PyMC3 (although it took me ~10 minutes to write the code in PyMC3 vs the 1 hour of heavy googling required for Turing.jl)

Here a GitHub repository with code and results.

The experiment is still in WIP and the results should be interpreted with a grain of salt since I a PyMC3 noob and I never woked with Turing before.

If anyone can spot problems with the methodology and want to share them it would be greatly appreciated.

One thing you might want to compare is the effective sample size ESS. It could be that one of the libraries is giving you more information per sample which could decrease or increase the performance gap you are hinting at.

In any case, I wouldn’t be surprised if pymc3 is comparatively slower

1 Like

One important thing to test is to use models at very different scales: toy-scale, small real world problem (<100 params) and something huge (e.g. thousands of params, including hierarchical and strange topologies).

For example, when we tried to use Tensorflow Probability for a work project, the GPU-accelerated NUTS worked very well on a toy problem. However, we could never get it to work with our actual problem (due to the recusion-unwinding that tried to allocate terabytes of memory) while PyMC3 could work through a small chain in about a day for that model. The scale there was 10^4 to 10^5 parameters & data points, in a panel.

I’m unsure what dataset & model would make sense to use, though.