Multi-level hierarchical model X-level deep

I was reading Bayesian Analysis with Python and the author was mentioning that hiearchical level are fine but that you should not use more than one level deep. He mentions that it makes the model harder to interpret but doesn’t expand that much on the subject.

For example I have a dataset where there’s different hierarchical levels:

  1. Franchise
    1.1 Auditors
    1.2 Stores
    1.2.1 Inspection (each row in the data set is an inspection)
  2. Cities

Right now I allow an intercept parameter to vary per franchise, auditor, store and city. But I have multiple franchises and auditors and stores actually belong to a franchise. So I was thinking of improving my model to have a higher hierarchy where my shared priors for stores and auditors would be at the franchise level and then I would have higher level priors where franchise would pool together.

Intuitively this would make sense to me, but it doesn’t seem to be the recommended way. Why would this be a bad idea?


Sorry for the confusion. I think the key is “unless the problem really demands more structure, adding more levels does not help to make better inferences”. I was trying to say that even when in principle you can always put a prior over a prior over a prior… ad infinitum, the levels should match your problem. In your example pooling information at the franchise level seems useful.


Thanks for your reply! This was not the first time I heard this also. I had a discussion with one of the developers of BAMBI wether they supported the lme4 syntax to specify multiple levels:

  • (x|site/block) where x has slope and intercept varying among sites and among blocks within sites.

This was not supported in BAMBI unfortunately and Jake mentioned that I would not get any benefits from using that anyway. Then I read your book where it said basically the same thing so I though there was a common theme there.

I might give it a try then.


Some potential concern is that you have too complex structure but not enough information from the data. In those case, you need more informative prior, especially the group variance (e.g., see the first paragraph here on Andrew Gelman’s post)