CustomDist infers logp expression from random generating graph

Am I understanding this right—CustomDist can just take a function for generating random variables and turn that into a density somehow? I couldn’t quite follow the documentation for dist argument of CustomDist, because it just says “represent the distribution” without saying whether that representation is a method for generating or a method for evaluating the log density. But there also seems to be a random and logp argument to CustomDist. This looks like what I would’ve expected the random argument to be from the doc.

In Stan, we could define y as a transformed parameter, but then we’d have to make the Jacobian adjustment for the non-linear transform of \mu and \sigma to get the density if we in fact observe y as opposed to y being a parameter (in which case, you won’t need the Jacobian adjustment, you can just define \epsilon and transform).

To work out the density of Y = (x + \mu + \tau)^2 with \tau \sim \textrm{normal}(0, \sigma), let Y = f(\tau) = (x + \mu + \tau)^2 and apply the change-of-variables formula to get p_Y(y) = \textrm{normal}(f^{-1}(y) \mid 0, \sigma) \cdot |{f^{-1}}'(y)|. I’ve used the scale parameterization of the normal here.

Yes, it does so by graph rewriting, trying to invert one step at a time and adding the corresponding change of variables expression, with checks for things like: if there’s an addition, then only one RV leads to it (variables that we are already conditioning on, don’t count), otherwise it would have to be a convolution…

Also more interesting stuff like Normal(mu)[:5] -> Normal(mu[:5]) or clip(Normal(), -1, 1) -> CensoredNormal(-1, 1)

The idea is that many distributions with a name that exist out there are just simple algebraic rewrites of other “more standard” forms and we can avoid implementing all those, or have the user define the densities themselves. They just define the way random draws are generated from some base distributions and we figure out the rest.

For instance this is how we implement the Chi distribution internally: pymc-extras/pymc_extras/distributions/continuous.py at 24e18e8a00e33bbbf17e9a4d4c444ef022b5ce9e · pymc-devs/pymc-extras · GitHub.

Historical detail, original CustomDist had those methods, now it takes also a dist one that kind of supersedes both and triggers the automatic logp inference I mentioned. We couldn’t just reuse random, because that expects a numpy function, not a pytensor expression. I have been meaning to revamp the documentation for a while.

Since I don’t get asked about it that often, there’s a cutsie probability puzzle notebook that builds on top of this functionality: Google Colab

1 Like

Cool. This sounds like Oryx. It looks like Oryx refocused on known transforms and compositionality judging from the doc.

The Wikipedia sections on related distributions are priceless for this. You can do all the location/scale families this way from their standard form. lognormal is just normal with an exp transform. A standard chi-square is just a squared standard normal. There’s a huge literature on random variate generation and how to do it efficiently and/or to satisfy testing requirements.

Yes it’s similar in nature to oryx except we don’t need to fight with user defined jax functions. And we can more easily take a holistic view of the graph (when that’s needed).

On probability equivalences, I once came upon a kind of catalogue that was incredibly exhasutive, more than wikipedia. But I’m not sure how I could find it again.

Ah and tensorflow-probability also has some interesting work on transformed distributions, with advanced linalg tricks. They use a more Object oriented approach.

Found the catalogue thing: probonto