Simpsons paradox and mixed models - large number of groups

@jessegrabowski Thanks a lot for sharing the video and suggesting another re-parametrisation that explicitly models the group mean along the x-coordinate. One thing that I noticed is that ‘x_hat’ is never used. Not sure whether this is intended?

I have tried it out and can confirm that it results in the “unbiased” posterior estimate of the “effect” of interest (mu_slope in the model above), similar to the plot above. I can confirm the low ESS for e.g. the group mean of the x-coordinate with an ESS below 10, cf. output below.

Dimensions:                     ()
Data variables: (12/20)
    x_intecept                  float64 8B 202.3
    x_beta                      float64 8B 9.959
    mx                          float64 8B 9.838
    x_sigma_log__               float64 8B 1.33e+04
    mu_intercept                float64 8B 203.2
    sigma_intercept_log__       float64 8B 1.959e+03
    ...                          ...
    offset_intercept            float64 8B 7.995e+03
    sigma_slope                 float64 8B 2.036e+03
    offset_slope                float64 8B 7.175e+03
    sigma                       float64 8B 1.216e+04
    intercept                   float64 8B 250.8
    slope                       float64 8B 4.277e+03

One thing that is not really clear to me in the above re-parametrisation is the usage of the ZeroSumNormal, what’s the benefit of using it here?

The low ESS for x_beta and mx seem to be independent of the group seperation in x, e.g. changing
group_mx = group_intercepts * 2.0 to group_mx = group_intercepts * 0.2 or to group_mx = group_intercepts * 20.0 does not improve this number or make it worse. However, for group_mx = group_intercepts * 20.0 the ESS values for e.g. mu_intercept are significantly reduced, so that the “sampling efficiency” still seems to depend on this parameter of the data generating process. I guess the definition of mx_hat is not compatible with those “scaling changes”, I am wildly guessing here, though.

I will dig deeper here. Thanks for all the great help and suggestions so far. If there are ideas on how to make the example "work out of the box ", I’d be super glad to hear them. I am learning a lot thanks to all the excellent help and explanations.

Edit: The x_beta posterior looks to have two disconnected symmetric modes (probably due to insufficient mixing, but again wild guessing on my side here), cf. below figure