HI @AlexAndorra, thanks again for your suggestions on this! I did try fitting the model with a negative binomial but the issues I was facing in my previous post were not resolved (predictions weren’t very good and there were many divergences unless I included a variance parameter for each observation in the linear term as described in this post).
I’m not sure I follow on the hierarchical structure comment. How would you define the clusters? Would you just use the intersection of all the factor variables to define the clusters and find separate intercepts and coefficients for the covariates within each cluster? How would this be different than the dummy variable approach?
Please excuse my ignorance on this. I have definitely looked at some of the tutorials on hierarchical regression and partial pooling on the pymc3 website, but I’m still not clear on the differences between the hierarchical approach and using dummy variables with an overall intercept.