Simpsons paradox and mixed models - large number of groups

@aseyboldt what’s the rationale behind the “normalize the predictor x” argument.

I am currently experimenting with a variant of the example for which x is sampled from a common (univariate standard normal) distribution for all groups and posterior sampling is improved (no divergences, larger ESS) when I demean each the predictors x per each group.

I have seen the “normalize the predictor” suggestion also elsewhere, but I haven’t found a convincing argument yet…

Btw: Modeling the group mean of predictors explicitly on the other hand does not help at all, in fact sampling efficiency is severely reduced and also the estimate of the population level intercept (alpha) seems to be biased - this is also something I am currently struggling to understand.