Memory is never very reliable, but my recollection is that we didn’t want to use Theano for two reasons.
a. It was Python embedded, and we didn’t want a Python dependency.
b. We thought it was doing purely symbolic autodiff, which we thought was a dead end for efficiency (it’s hard to generate performant derivative code symbolically—reverse mode autodiff automatically creates a dynamic programming solution for shared variables).
For (b), this was mainly me, Matt Hoffman, and Daniel Lee. The big decision point for us was whether to work dynamically (like PyTorch) or statically (like TensorFlow). We went with dynamically because it was going to be much easier to launch a math libray and more flexible for users. In retrospect, given the architecture of GPUs, the TensorFlow approach adopted by JAX makes a lot of sense.
Also, it was really hard to find this stuff. Autodiff was super niche when we started in 2010, mostly being used by the applied math community to do sensitivity analysis on solvers. We didn’t find ADMB until much later, for example—that was a really groundbreaking autodiff (AD) model building (MB) system consigned to obscurity in the fisheries and wildlife community. Sort of the way emcee is largely used by the astrophysics community (it’s pure Python and gradient free, with really clever auto-tuning that is similar to what Matt Hoffman’s doing these days).