What does the "maximum tree depth" warning mean?

Hi dear community!
I’m wondering about the warning about maximum tree depth: The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.

  • I know what increasing target_accept and reparametrizing do, but I never encountered max tree depth – what does it hint at, and how do you handle it usually?

  • In particular, if you have no divergences and no r_hat warnings, but just effective sample size warnings, what does it say about your trace’s quality?

I can share the model I’m working on, but I’m more interested in general heuristics here – and I still need to test some things on my model to be officially stuck :stuck_out_tongue_winking_eye:
Thanks a bunch in advance for your helpful comments :vulcan_salute:

1 Like

NUTS using a binary tree doubling guide how many leapfrogs it should do. It is a recursive algorithm only stop when it hits a U-turn in the trajectory (and also when there is divergence). Of course, in practice we wont run the recursive forever, so there is a upper limit of how many trees we double (default to 10 usually). When you reach the maximum tree depth and still have not got a U-turn, you will get a warning.
It does not effect the correctness of the sampler, but since it effectively stop early your sampler is not as efficient. Usually, this is an indication that NUTS is taking too small steps. You can also try increase the tuning so that the mass matrix is adapted a bit better.

3 Likes

Ow ok, thanks Junpeng, that’s very clear!
From what you said here:

Can I conclude that one way to not reach max tree depth would actually be to reduce target_accept, since NUTS would then take bigger steps?

Yes and no, in theory that should be the case, but in practice the reason you are getting small step size initially is because your posterior density space has region of high curvature which you do need small step size to sample those.

1 Like

Mmmh ok :thinking:
And I know sometimes switching to a non-centered parametrization actually hurts sampling (e.g it seems to be the case in my model). So if I followed you correctly, that means that the non-centered parametrization has an even higher curvature than the centered one – in these cases, your only choice is to increase target_accept and just sample from the centered model for a (very) long time?

I would say most likely, but in reality there are more nuance to that. You could have a centered parameterization that contains pretty nasty geometric, but adaptation just ignore that and estimated the covariance around the mode. The result is that although you have a nicer geometry using a non-centered parameterized posterior (overall smoother and less curvature), it contains higher curvature around the mode and thus result in small step size after adaptation.

In general, so many things could happen it is hard to say what is the situation of your model. You could deep dive using e.g. the pair plot of the posterior sample for more insight.

3 Likes

Thanks Junpeng, this thread was really helpful! I’ll mark it as solved, as I just wanted a conceptual and high-level understanding of the max tree depth warning, that I and the community could refer to when getting this warning :wink:

2 Likes