Why is sampling sometimes slow?

This behavior is usually caused by one of the chains finding a region of high curvature during warmup, then setting a very small step size in order to sample that region. This leads to requiring very many steps to hit a U-turn in lower curvature areas of the posterior. If you remove the slow chain(s), it induces bias in estimates away from the regions of high curvature. Whether you can live with that amount of bias is another question, but it’s hard to estimate how much bias there is without sampling.

You can develop intuitions in a multivariate normal with different scales for each variable and no correlation. It has to set a small step size to deal with the smallest scale and then take a number of steps proportional to the largest scale divided by the smallest scale to traverse the largest scale parameter. This is usually formalized through the notion of condition, which is the ratio of the largest eigenvalue to the smallest in the Hessian of the negative log density. In the multivariate normal example, this is the ratio of the largest scale divided by the smallest scale, squared. The leapfrog algorithm requires step sizes smaller than 2 / \sqrt{\lambda^\textrm{max}}, where \lambda^\textrm{max} is the max eigenvalue, though that’s right at the boundary of stability for the algorithm and step sizes about half that maximum stable value tend to work better. These estimates are the basis for the step size tuning in Matt Hoffman and Pavel Sountsov’s parallel samplers.

3 Likes