I’ve observed a significant difference in a logistic regression model that includes some categorical features which have been encoded as dummy variables when using all dummy columns vs using all except 1. In the latter case, the chains are much tighter and cleaner, no divergences and rhat scores are all 1, while in the former case lots of divergences, ugly chains, and rhat of 1.05 - 1.12.
Note, the features are some independent binary features + a categorical variable that’s dummy encoded.
I have two questions:
What is the reason for this?
As I’d prefer to use the better identified model, how do I estimate the impact of the left out category from the categorical variable?
Here are some plots of the traces of the two models.
Model with the all dummy features from the categorical:
Model with one dummy feature dropped.