I have a hierarchical regression model that uses the marketing hierarchy of each item sold to inform the parameters. There are six levels total for each parameter.
Can having more levels effect the model in a negative way? When looking at my autocorr plot, I get the following:
Wondering if having too many levels can present problems or if I’m grasping at straws.
Can you clarify what you mean by levels? Are you talking about the number of elements in a random effect (so, indices [0, 1, 2, 3, 4, 5])? When I think of levels, I’m referring to the number of variables that are functions of other variables, so a three-level hierarchy is z = f(y), y = f(x) and x = f(). If its the former, then more is generally better because you can better estimate the hyperparameters of the random effect. If the latter, then you do tend to run out of information for estimating the hyperparameters of some of the levels unless you have a lot of data that can inform the various parameters.
Thank you. The latter. Forecasting items sales. Hierarchy is item sales are a function of their marketing hierarchy which is sales to industry cut to subcategory to category to business line.
Yeah, six levels is a very deep hierarchy. This is fine provided that you have data to inform the processes operating at each level. For example, in a geographic analysis you may want a model that nests household within city within county within country, but that requires data from multiple households from multiple cities from multiple counties from multiple countries. If that’s not a problem in your case, then it may just be a model misspecification issue or some other technical problem.
The original plots posted don’t necessarily indicate any problems with sampling. It’s fine for samples to be autocorrelated, especially if the
target_accept was set quite close to 1 (inducing small step sizes during HMC). It just means there will be a lower effective sample size in the posterior, and you may need to run the chains longer. But as long as the chains are converging and mixing without divergences, I’d say that autocorrelations aren’t themselves a problem.
Thank you. I do actually have plenty of data for these levels so I’m guessing model misspecification. Thanks again!
Ok. I was reading this: Exploratory-Analysis-of-Bayesian-Models/content/Section_02/Visual_diagnostics.ipynb at main · arviz-devs/Exploratory-Analysis-of-Bayesian-Models · GitHub and thought the autocorrelation may indicate issues. There are no divergences so that’s a good thing but I’m finding as I back off variables in the model, as autocorrelation decreases, so does RHat. When my autocorrelation is high, RHat also seems to be unreasonably high as well.
High auto-correlation can lead to high rhat because the chains aren’t able to efficiently explore the space of the posterior, so given a limited number of draws they can’t mix together well. Obviously it’s not a good situation, but it’s also not necessarily a sign of a pathological situation, in the same way that divergences are. It could just mean you need to run the sampler longer. Again, autocorrelation lowers your effective sample size. With NUTS we’re used to 1 sample is 1 sample, but with high autocorrelation you need many samples to get the information of 1 effective sample. So it’s inefficient, but again, not necessarily biased.