That means you want the probability of a malicious flag (the Bernoulli’s probability of success) to increase when invoke
increases, which means you’re expecting the coefficient on that predictor to be skewed towards positive values. You can just encode that knowledge into the mean of your Normal prior. This reasoning applies for any of the predictors you have.
Then you need to do prior and posterior predictive checks to see how (and even if) these changes affect your model and results.
More generally, I’m skeptical that you need 2K predictors to have a good model. I would encourage you to try your hand on a way more parcimonious model first. That’s because:
- It’ll be much easier to reason about it (I personally can’t hold 2k predictors in my head and explain how they independently influence the probability of success).
- It’ll probably make fitting more efficient – my prior is that lots of these 2k predictors are strongly auto-correlated, i.e they give you redundant information.
- From a scientific standpoint, my guess for now is that you don’t need 2k predictors to explain this phenomenon
Setting constant=0
means that the probability of success in that case is expected to be 50% (because sigmoid(0) = 0.5
), so that’s actually not what you want.
I would keep the intercept (it’s usually very helpful in any regression setting), and set a negative prior on it, to encode your knowledge that probability of success in those cases is close to 0. If your model infers that indeed the intercept is (very) negative, then that’s a good sign, based on your domain knowledge.
Hope this helps