Hi, I’m making a logistic regression model and I have a lot of categorical variables that I one hot encoded.
My model keeps diverging.
What type of prior should I choose to help it along a little?
Thanks a lot for any insights, tips and hints
Hi, I’m making a logistic regression model and I have a lot of categorical variables that I one hot encoded.
My model keeps diverging.
What type of prior should I choose to help it along a little?
Thanks a lot for any insights, tips and hints
What type of data it is? Is it closely related?
If not then one way code be recoding categorical variables by making clusters to other classes if the categories represent a pattern.
Else try PCA or SVD for dimensionality reduction.
You can also try using L2 regularisation.
Hi Oliver-
If you’ve one-hot encoded a lot of categorial variables, it is likely that your model suffers from multicollinearity, meaning that one or more coefficients are non-identifiable. See this post for more information.
The standard recommendation would be to identify the source(s) of multicollinearity and either combine or drop these features to ensure model identifiability.
Definitely, thanks a lot. I dislike One hot encoding because of this since 1 in category X implies 0 in all others. It’s so redundant. But I’m also learning and it was the only way I could think about feeding it into the model.
I will try PCA like mentioned in your other post and above by @5hv5hvnk
Thanks to you both!