I’m trying to use a hierarchical model to analyze my data, but I’m running into a lot of divergences.
I’m basing the model specification on a blog post by Thomas Wiecki (https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/). Also see a full working example of my use case below.
I don’t really understand the reason for sigma_a and sigma_b in this specification. The blog post says they can indicate confidence in the offsets from the mean, but I think that’s also indicated by the distribution on those actual offsets. Further, it seems to me that they will correlate with a_offset and b_offset.
For example, sigma_a could increase a lot as long as a_offset is also shrinking towards 0. When looking at the traces the sigmas also seem to have a similar funnel-tendency as described in the blog post.
If I remove the sigmas from the model specification, the problem with divergences disappear so this seems to be at least part of the problem. What are sigma_a and sigma_b useful for and what am I giving up by removing them?
EDIT FOR CLARIFICATION:
Note that the model is parametrized as:
b = mu_b + b_offset * sigma_b
And analogous for a
. In this case it seems like b_offset
and sigma_b
are tightly coupled. Say that the offset should be 1, then that can be modelled either as b_offset = 1, sigma_b = 1
, or as b_offset = 1000, sigma_b = 0.001
. This seems like an identity problem and I don’t see what in the model makes sigma_b
higher or closer to 1 as we gain more confidence.