Simpsons paradox and mixed models - large number of groups

@aseyboldt thank you very much for your quick reply and the very detailed suggestions. I have tried this, and I am also getting lowest ESS larger than 400.

With the original 5 groups I am getting the following lower ess values

<xarray.Dataset> Size: 96B
Dimensions:              ()
Data variables:
    intercept            float64 8B 30.51
    group_sigma_log__    float64 8B 177.3
    group_offset         float64 8B 36.71
    x_effect             float64 8B 29.39
    x_group_sigma_log__  float64 8B 27.33
    x_group_offset       float64 8B 198.2
    sigma_log__          float64 8B 5.215e+03
    group_sigma          float64 8B 177.3
    x_group_sigma        float64 8B 27.33
    sigma                float64 8B 5.215e+03
    group_effect         float64 8B 29.7
    x_group_effect       float64 8B 27.33

I am assuming (as a total beginner) that ESS stands for effective sample size, meaning that larger is better?

So with those numbers my effective sample size for, e.g the (most interesting) x_effect seems to be too low (given that number of draws is 1000 so an ESS of ~30 indicates an autocorrelation time of ~30 steps - is that reasoning valid at all, my MCMC theory is a bit rusty so I might be totally off here. So for the original number of groups sampling is still difficult, is that understanding correct?

As someone who is new to Bayesian hierarchical modelling, it is quite difficult to understand why this specific reparametrisation, i.e. replacing
\mu = \beta_0[g] + \beta_1[g] *x in the original (linear) model with \mu = intercept + group_effect[g] + x_effect *x + x_group_effect[g] and/or adding noise to the slope in the data generating process improves sampling.

Thanks for looking into the “bias” on x_effect, I’d be very glad to get any insights on that.

By the way I think the notebook is an excellent example for getting started with hierarchical modelling and I am very glad for the excellent pymc documentation and examples, as well as the communities willingness to answer questions and help out. Thanks a lot to everyone!

Edit: Surprisingly the “bias goes away” if I reduce the factor 2 in group_mx to 0.2. If I increase this to 20 the ESS drops significantly, so there seems to be some undesired interaction between the “distance” of the different groups and the sampling and convergence properties - I don’t understand why, yet but if I do I will also follow up here.