Train Test Splits with Multi-level hierarchical regression model where the test set contains unseen values in the hierarchy

So after reading this post, I was determined to experiment with model factory. The method that I believe will work is to add new values to the train set and test set that includes region, state, and county combinations that the other is missing but with np.nan y values.

That being said, I can’t post this as a solution as I have discovered that the base model has an issue. The a_state and a_region coefficients make no sense. I believe they should be centered around the average value for observations in their state and region respectively, but they are not. When I discover what’s wrong and have the missing values figured out, I’ll post an update.