Adaptation phases of PyMC3 HMC NUTS sampler

Hello, I’m hoping to discuss and learn more about the specifics of the adaptation phase(s) of the core HMC NUTS sampler, as is implemented and used in PyMC3. This is towards the end of implementing a similar adaptation routine in the HMC sampler of the “nimble” R package.

As a starting point, I am referencing the NUTS algorithm as outlined in Hoffman and Gelman (2014). As I understand, that reference provides an explanation of adaptively setting the leapfrog stepsize, as well as the number of leapfrog integration steps (the “no u-turn” aspect).

However, it does not address dynamic adaptation of the “mass matrix”, or what I believe Stan calls the “metric”, which defines the covariance matrix (perhaps diagonal) used for drawing the initial momentum vector ( p ) on each HMC iteration.

The heart of my questions will relate to, generally, the HMC adaptation scheme PhyMC uses:

  • What are the main adaptation phases(s) for the HMC tuning parameters?
  • Which among the stepsize, and the mass matrix (metric) are adapted in each phase?
  • How, numerically, these quantities are adapted during each phase?
  • How long each adaptation phase is, or what determines when they begin/end?

I am concurrently looking into exactly how Stan does this, loosely guided by this figure in the Stan reference manual, but I also wanted to consult the PyMC community, and I would welcome discussion with any developers, or a pointer to where I can find this detailed information in the PyMC documentation.

Thank you kindly for any help,
Daniel

1 Like

Hi! Welcome!

I think that this talk by @colcarroll is the best reference about the stages of the warmup process in pymc3. I am tagging him to confirm this just in case.

In addition, if you are interested in HMC-NUTS tuning, I would suggest also taking a look at this question on Stan discourse and the campfire library.

1 Like

Yes, this talk/essay has lots of diagrams!

The Stan and PyMC3 methods of adaptation have more in common than I thought: both adapt step size all the way through, and run foreground/background mass matrix adaptation windows. The literature on the adaptation windows is a little thin: for HMC you might see [1206.1901] MCMC using Hamiltonian dynamics, section 4.1, or section 4.2 of [1701.02434] A Conceptual Introduction to Hamiltonian Monte Carlo

These methods are mostly designed around NUTS – note that for HMC you are also thinking about integration time!

2 Likes

@OriolAbril @colcarroll Thank you both very much for the pointers. These look extremely useful. I’ll take some time to digest them, and see how far I can get. Once again, I really appreciate you both taking the time to respond.

2 Likes

@OriolAbril @colcarroll Thank you both again. I’ve spent a fair amount of time reading over these materials, to better understand the HMC adaptation routines of both PyMC3 and Stan.

I have one remaining question, which is straight-forward, but I want to make sure I get this correct. I know the mass matrix (or the metric, M) will be used to re-sample the momentum variables ( p ) at the onset of each HMC sampling iteration.

Assuming we follow some strategy to adapt the metric (M) to resemble the covariance of the parameter space (q), that is, M resembles the empirical covariance of our HMC samples, then, when we re-sample the momentum vector ( p ) using a zero-mean multivariate normal distribution, do we use M as the covariance of the multivariate normal, or as the precision ?

I’m sorry to say, but the answer to this is still not clear to me, after pouring over these materials. Thanks to anyone who can confidently answer this question.

Daniel

I am having trouble following your notation, but maybe this statement helps?

If your posterior distribution is \mathcal{N}(\mu, \Sigma) , and you draw p \sim \mathcal{N}(0, \Sigma^{-1}), then the dynamics will be equivalent to if q \sim \mathcal{N}(\mu, I) and p \sim \mathcal(0, I). See page 22 of Neal for a discussion and derivation. The tensorflow probability unit tests make sure that using this kinetic energy adapts to the correct step size, which is an indirect way of confirming the internal dynamics