Current version of HMC/NUTS in PyMC

Hi all,

Don’t suppose anyone knows if the current version of PyMC has had any adjustments to the HMC/NUTS algorithm? For example I know the Stan package made a number of edits like multinomial instead of slice sampling, comparing between trees in NUTS..etc (Reference for current version of NUTS/HMC in stan? - Algorithms - The Stan Forums) so I wanted to get a more confirmed answer if any such changes exist on the algorithm side for PyMC.

Thanks!

Certainly but wouldn’t know where to get started. Perhaps @aseyboldt or @colcarroll can give some hints

1 Like

This paper is very close to describing Stan’s algorithm in section 3.5:

The piece that we’re missing here is that the U-turn condition is a bit more involved in that we also check the halfway points for U-turns. So it’s not just purely sub-U-turns and outer U-turns, but also when we combine two trees, A + B, we also check if the midpoint of A makes a U-turn with the midpoint of B. There are more details in Michael Betancourt’s paper, which was the first description of the changes he made; it’s also a really nice paper to learn about HMC:

The three-tier adaptation method we used is best described in the Automatic Parameter Tuning section of our reference manual in the Algorithms/MCMC Sampling chapter:

Nutpie

@aseyboldt has upgraded PyMC to use his Nutpie sampler, which does adaptation differently (and better as far as we have measured) than Stan. As far as I know, there’s not a good writeup of that anywhere. The main difference is that Adrian starts warmup with a diagonal mass matrix filled with absolute values of the gradient (this is the geometric mean of the outer product of gradients estimator and a unit mass matrix). Then he adapts more continuously as he goes, and has overlapping intervals during Phase II warmup. We’re hoping to add this functionality to Stan and help write it up.

1 Like

Thanks Bob for those helpful citations! So it seems like the key takeaway is that for the case of PyMC the primary algorithm edit is in the tuning edits to yield empirically better performance (although nutpie isn’t the default implementation but rather a choice for compiling a pymc model and doing NUTS sampling with the mentioned tuning edits).

From the brief dig around in the pymc code I have done I think it does seem like none of the other Stan algorithm edits exist and this is reflected in the PyMC NUTS documentation which says they use the dual averaging implementation of NUTS from the original paper.

Thanks for the guidance, am just writing a thesis and thus am trying to be as nitpicky as possible when describing the background on samplers used :sweat_smile:

Stan still uses dual averaging to estimate step size. We haven’t really changed the way we do adaptation since the other changes that switched from slice sampling to multinomial sampling. We will probably be switching to Nutpie’s diagonal estimator as soon as we finish some more evaluations.

If PyMC isn’t doing the multinomial sampling, they really should. The nested U-turn condition is less important, but also useful for preventing overly long trajectories. I can’t even easily describe what that does (a sign that I don’t understand what it’s doing well enough).

Getting to the bottom of these things is good, especially if you can write them up for others.

I would urge moderation—“nitpicky as possible” is like “safety first”—if we really meant it, nothing would ever get finished/started.

1 Like

Correction PyMC doesn’t yet use nutpie’s new adaptation scheme, although we want to port it over as it can get us up to speed with way less evaluations.

There are actually different tuning strategies one can use with the argument init_nuts

2 Likes

The default pymc sampler is almost identical to the stan sampler, it does use multinomial sampling for instance.
There are some slight differences in the adaptation windows, but I don’t think chose change much.

2 Likes