Hi @nanounanue, I just saw your question. I’ve used Stan and PyMC3 a lot, and I served as technical lead on the evaluation team for PPAML, the DARPA program that funded Figaro and Anglican. I like @junpenglao’s suggestion of checking out Rainforth’s thesis, but I wonder if some discussion could be helpful as well.
There’s an analogy I’ve found helpful in thinking about how these different systems. Say someone vaguely asks you to “drive them to a thing”, and you get to choose between a luxury sedan and one of those hardcore off-road buggy things (there must be a term for those, I’ll go with “buggy”) like they used in Westworld. Which is the better choice?
If you need to be able to go absolutely anywhere, you should take the buggy. If it’s in-town driving, the buggy will be fun for a while and then probably get tiring. And if it’s a highway road trip, it will not be nearly as enjoyable.
By constraining the problem space (“normal” road driving), the sedan can do a better job when it’s in its preferred space. That doesn’t mean the sedan is “better”, or that you couldn’t use the buggy for absolutely everything.
Ok, back to the point. All of the systems you mention satisfy Avi’s description. “Universal” probabilistic programming (as in Anglican) is the buggy. It can represent any model you throw at it. Stan is the other extreme, limited to a fixed-dimensional parameter space over differentiable distributions. Stan can’t represent nearly as many models. Things like mixture models are a bit awkward, because you have to jump through some hoops to sum over the discrete values (weird that they haven’t automated that). But when you do sum them out, you get the benefit of “Rao-Blackwellization”, making inference more efficient. And it generates efficient C++ code, so (after you get past compiler overhead) inference is very fast.
Well, that almost does it. There’s still the potential for general-purpose systems like Anglican to recognize (automatically or through user input) that the model is constrained, and to go with a specialized algorithm. In principle, a universal PPL can give the best of both worlds. In practice, this can come down to the time and energy of the development team, characteristics of the language they’re using for development, and the degree to which they can leverage external tools.
In terms of “representable models”, it’s something like
Stan < PyMC3 < Figaro < Anglican
(Figaro has some advantage over PyMC3 because it allows an “open world” - you can introduce new variables as you go).
Overall , these are all great, and I think the primary decision I would use to decide is what language you prefer to work in:
C++, command line, or R -> Stan
Python -> PyMC3
Scala -> Figaro
Clojure -> Anglican