Logistic Regression with severe divergences

Great, I’m glad that worked!

I agree with Ricardo that things might be different with other models, but for logistic regressions I would always try to use the logit if possible, basically because we can keep all probabilities on the log scale. You can see in the source that pymc3 does something special with the logits, whereas log(p) is taken if probabilities are given. So if your logit is very small, for example, it could happen that invlogit(x) ends up being zero, and then taking log(0) gives an infinity. On the other hand, the code using the logits should still work fine. If you’re curious, you could try the two versions in the source code for some extreme values of the logit, e.g. -40 or something and compare the results.

I do have some intuition for why things are fine with a few predictors. When you do X \beta in your logistic regression, each row of the result is going to be a sum of random variables: x_1 * \beta_1 + x_2 * \beta_2 ... x_n * \beta_n. Under your prior, the \beta_i are independent with variance 1, so you can compute the variance of the result (under the prior) as the sum of the individual variances. For a given (constant) set of x, that’ll end up being \sum x_i^2 Var(\beta_i) = \sum x_i^2. If you have only a few predictors, that sum is likely to be small. But if you have many, like 1000, you could end up with a variance of about that number, or a standard deviation of over 30, and logits that large could start causing trouble. If you’re curious, it might be interesting to see whether things work better with the invlogit if you make the prior on \beta tighter, e.g. \mathcal{N}(0, \sqrt{1 / p}) where p is your number of predictors. But that’s a stronger prior of course which you may not be comfortable with!

5 Likes