Out of sample prediction with new category

Your last idea is basically correct. You need to add an “other” category to your training data, and use that to make predictions for any categories not in the training data. You don’t want it to just be an empty “fake index”, though, otherwise doing inference with it is pointless.

Another option is to set all the country offsets to zero, which amounts to using the hyper-mean (a in the example code) to say something about an unknown county.

2 Likes