Simpsons paradox and mixed models - large number of groups

I don’t think the 2 modes of x_beta are due to mixing. If that were true, you’d see jumping between the modes within the chains. What you see instead is that the algorithm has converged to two equi-probable posteriors, based on the starting position of the sampler.

I noticed it as well, but since the x model is just a bunch of nusance parameters from the perspective of inference on mu_slope, I ignored it. Ideally you’d want to just marginalize it all away mathematically, but that’s way beyond my ability. You could try removing the slope component from x_hat and see if it does any better/worse.

Maybe nothing? It’s meant to take into account the fact that you only have n-1 degrees of freedom when computing slope offsets. It’s related to the dummy variable trap, if you ever saw that in an OLS class. In my experience you can get away with using normals just fine, but ZeroSumNormal is new and hip. There are also some extremely subtle differences when it comes to model interpretation (check the very last paragraph of that post), but I’m not sure they’re relevant here.