Warning when NUTS probability is greater than acceptance level?


Been working on a model for a little time now and it’s been running daily without issue. Then today it started outputting the below warning.

...pymc3/step_methods/hmc/nuts.py:451: UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.93159983116, but should be close to 0.8. Try to increase the number of tuning steps.

My understanding is that that warning should only throw when the probability is less than the target/acceptance rate - that correct?
If so then not sure why it’s outputting the warning!

All the posteriors seems fine (as before) and the usual checks (sample_ppc, and autocorr) all seems fine still also.

1 Like

NUTS output a warning when the acceptance is not close to 0.8 (or the value you set). In this case, you can increase the tuning (eg tune=1000).

I see now - higher probability of acceptance isn’t always better because too high would imply bad exploration of the posterior space. This http://docs.pymc.io/notebooks/sampler-stats.html with trace.mean_tree_accept.mean() and plt.plot(trace['step_size_bar']) for different tuning lengths helped clarify it all for me.

Where 0.8 comes from is totally opaque to me though - guess I’ll have to go actually read the papers! :wink: (edit: The paper in question: https://arxiv.org/pdf/1411.6669.pdf)


I think you can argue that this warning is not necessary, of all the warnings the sampler can show at the moment this is clearly the least worrying. (Just to be clear: only if the actual acceptance rate is higher than the target). There really isn’t a good reason why an acceptance rate of 0.8 is always the value you want. High acceptance rates mean we spend a lot of time solving the hamiltonian very precisely, and at some point this just isn’t worth the effort. It is quite common to increase the target acceptance rate to something like 0.9 or even 0.99 for more complicated posterior geometries. So this warning means that we are solving the ode even more accurately than we said we wanted, which isn’t a problem in itself. However, this also means something didn’t go quite as planned, because during tuning we usually expect to adapt the step size such that the acceptance rate matches the target. This usually just means we’ve spent a bit of time needlessly, but it could also indicate that maybe other parts of the adaptation (the mass matrix) didn’t quite work out, and that might actually be a problem. So to summarise: don’t worry to much about this one, but to be safe, maybe increase the number of tuning steps a bit.


About the source of the 0.8: I’m not sure anyone really knows where that exact values comes from, I think the stan guys experimented a bit and concluded that this sounds like a decent default. We might even want to increase the default to 0.9 at some point, that just seems somewhat more reliable in my experience. But I haven’t done any real experiments on that.

So just to check: average acceptance rate of 0.8 -> better exploration of space than if it was 0.9, is not correct?

Ok maybe I just stop asking questions until I’ve read the NUTS papers… haha

But thanks for further explications - they’re insightful even if maybe I can’t just stitch it all together exactly yet.

I think there are three issues here, that are easy to mix up:

  • At least how large does the actual acceptance rate have to be, such that we sample from the right posterior (ie we don’t get any divergences)
  • Among all the acceptance rates that work (again, no divergences), which one should I choose so that I don’t waste time by rejecting samples all the time or by solving the hamiltonian more precisely than I need to.
  • Did I find the right step size during tuning such that the actual acceptance rate matches the one I specified (the target_accept parameter for nuts). This is what the warning is about.

The paper you mentioned really only talks about the second point. Unless you have some theoretical interest in that exact topic, I don’t think this should be on the forefront of your thoughts when working with pymc3. Good parametrisations such that the geometry of the posterior is nice, or computational efficiency and stability are typically much more important.