User experience: Python vs R, PyMC vs Stan, PyTensor vs JAX

spinkney · January 24, 2025, 12:45am

I joined here (from the Stan forums) to say that @bob-carpenter and the other founding Stan members are incredible. The apocryphal story (me making this up) is they looked at Theano, couldn’t understand the doc, so they said NUTS we’ll just build our own AD system from scratch, our own ppl, our own compiler, and this new fangled multinomial hmc thing.

Duc_Quang_Nguyen · January 24, 2025, 3:10am

Since the topic is about user experience, I hope my comments are relevant.

I spend most of time writing ODEs systems for infectious disease modelling. I appreciate that there are several research papers out there that provided the workflow for the analysis based on STAN (especially after COVID). Those workflows are quite useful and relevant to me too. At the time when I started modelling though, I found the model specification in STAN a bit daunting because my model structures are quite hard to formalise into equations into STAN. If those workflows were available at the beginning, I probably would have made a different decisions.

PyMC was easier for me to pick up. The process from ideas to model structures to codes is faster too. I think because I was more familiar with Python so that’s that.

On the side note, I saw this tutorial for Numpyro in epidemiology. Look very interesting. After I finished my thesis, I will have look at them. My feelings is Numpyro is also very Python-centric (?)

Duc_Quang_Nguyen · January 24, 2025, 7:09am

I’d like to add that PyMC, as a PPL, includes some very useful and powerful “helper” (?) functions to build models and understand my problem. Thanks to the PyMC developers.

bob-carpenter · January 24, 2025, 1:59pm

Thanks—this is really great. I’d love to hear from more users like this.

This is the common refrain I’ve heard from PyMC users—they really really don’t want to learn another language on top of Python. For reasons I don’t understand, the R users are less resistant. Maybe because the syntax is more similar? I would have expected more resistance because R is also implicitly typed. If I had it to do over again, I would have followed NumPy’s structure and not tried to differentiate arrays and linear algebra types the way that Boost (C++ matrix library) does. The advantage of doing it the way Boost does is that you can infer the types of the result. A vector times a row vector is a matrix and a row vector times a vector is a scalar. Back when I was a professor I worked on a mix of PL (programming languages) and NLP (natural language processing), largely around typing.

Could you elaborate on what the helper functions do?

bob-carpenter · January 24, 2025, 2:10pm

Memory is never very reliable, but my recollection is that we didn’t want to use Theano for two reasons.

a. It was Python embedded, and we didn’t want a Python dependency.

b. We thought it was doing purely symbolic autodiff, which we thought was a dead end for efficiency (it’s hard to generate performant derivative code symbolically—reverse mode autodiff automatically creates a dynamic programming solution for shared variables).

For (b), this was mainly me, Matt Hoffman, and Daniel Lee. The big decision point for us was whether to work dynamically (like PyTorch) or statically (like TensorFlow). We went with dynamically because it was going to be much easier to launch a math libray and more flexible for users. In retrospect, given the architecture of GPUs, the TensorFlow approach adopted by JAX makes a lot of sense.

Also, it was really hard to find this stuff. Autodiff was super niche when we started in 2010, mostly being used by the applied math community to do sensitivity analysis on solvers. We didn’t find ADMB until much later, for example—that was a really groundbreaking autodiff (AD) model building (MB) system consigned to obscurity in the fisheries and wildlife community. Sort of the way emcee is largely used by the astrophysics community (it’s pure Python and gradient free, with really clever auto-tuning that is similar to what Matt Hoffman’s doing these days).

ricardoV94 · January 24, 2025, 4:19pm

Theano was definitely doing reverse (and more patchly forward mode) AD all along, and worrying about avoiding duplicate computations between forward and autodiff. It also had higher order AD from the beginning.

Tensorflow 1.x was greatly inspired by Theano.

aseyboldt · January 24, 2025, 4:21pm

I’ve recently worked with stan again a little after mostly using PyMC for years, and the thing I missed most was (perhaps surprisingly) support for named dimensions. It might sound like a pretty minor detail, but to me it really makes a difference in practice.

And that is even though the pymc support is hardly perfect. The random variables remember the dimension names, but all intermediate variables don’t and just use numpy broadcasting rules.

If I were to redesign something, I think that would be pretty far up my list of things I want to support.

aseyboldt · January 24, 2025, 6:40pm

bob-carpenter:

This is the common refrain I’ve heard from PyMC users—they really really don’t want to learn another language on top of Python. For reasons I don’t understand, the R users are less resistant. Maybe because the syntax is more similar? I would have expected more resistance because R is also implicitly typed. If I had it to do over again, I would have followed NumPy’s structure and not tried to differentiate arrays and linear algebra types the way that Boost (C++ matrix library) does. The advantage of doing it the way Boost does is that you can infer the types of the result. A vector times a row vector is a matrix and a row vector times a vector is a scalar. Back when I was a professor I worked on a mix of PL (programming languages) and NLP (natural language processing), largely around typing.

The numpy broadcasting rules are actually a bit of a pain to work with in pytensor too. If you want to follow it strictly, you can’t tell until runtime what for instance a multiplication of two arrays will actually do. Depending on if a shape is one, it might broadcast, error out, or do elementwise multiplication. But which one happens has implications for the derivatives, because those won’t be the same in those cases.
If I could go back in time, I’d love to tell the numpy devs to not use a dimension length of 1 to indicate that something can be broadcast, but a dimension length None instead. I think that would have prevented a lot of strange bugs over the years…

bob-carpenter · January 24, 2025, 7:31pm

This has been really high among our user’s requests. Probably because it’s pretty well supported everywhere in R.

While you’re at it, could you slip a note to the R developers to not conflate size 1 arrays with scalars. That is such a mess on the debugging side, but R goes a lot further into the weeds than NumPy.

Duc_Quang_Nguyen · January 26, 2025, 10:29am

It could be force of habits as I was taught C++ and Matlab during my college’s days.
Thanks a lot for providing the background knowledge about Stan (and Bayesian methods). Very good learning materials!

I like the ArviZ library in general. Within the PyMC, there are two functions:

pm.Potential . Handy function to impose constraints. I used this function to weight my data points. I need to do evidence synthesis from multiple sources, so being able to weight data is great.
pm.Truncated. When building model, I need to look at a particular regions of my variables for various reasons (e.g., experimenting with model structures/ functions, sourcing for priors from literature or sometimes, just to get some results because the model is not fitting properly). I do made a point to remove all these Truncated distributions in the final model.

bob-carpenter · January 27, 2025, 3:27pm

The Stan doc recommends ArviZ for CmdStanPy. Aki Vehtari wrote some of the code for the methods he refined (R-hat, ESS, LOO) and made sure they get the same results as Stan’s R package posterior and as our reference implementations in C++. I really like seeing the results synchronized like this on state-of-the-art implementations.

Topic		Replies	Views
PyTorch backend for PyMC4 PyMC4	42	25747	April 18, 2018
Translating a Stan model to the most performant PyMC3 form Questions	1	752	October 27, 2021
Poor Performance of pyMC5 vs pyMC3 for large number of variables Development	10	2115	January 25, 2023
Let's introduce ourselves!	35	5067	February 19, 2019
Dynamic shaping, "round" function, JAX, and a "few" more questions v5 modeling , jax , pytensor	28	1550	September 29, 2023

User experience: Python vs R, PyMC vs Stan, PyTensor vs JAX

Related topics