Best logistic model structure for boolean covariates and interactions

I have a simple model where I’m trying to identify whether someone will answer yes/no to a question best on whether they are employed (0/1), married (0/1), have kids (0/1), age (continuous), and 5 other domain specific boolean covariates. What would be the best way to specify an interaction model such as this?

I definitely understand how to specify an interaction between age and other features, however I don’t know an efficient way to specify an interaction between the categorical binary features (i.e. being married AND having kids AND employed may have an interaction effect). There are 1024 combinations between the boolean features which makes things particularly complicated. Any ideas?

To have so many interaction terms seems a little impractical from an inference standpoint too: do you have reason to think that a + b + c + a:b + b:c + a:c + a:b:c etc etc. would capture more info than simple linear independence a + b + c?

If you still want to enforce / measure the correlation between coefficients I suppose you could try pulling them from a correlated MvNormal similar to what McElreath does here: http://xcelab.net/rmpubs/Mcelreath%20Koster%202014.pdf

1 Like