How many samples are drawn and used for tuning in the other samplers? Looking quickly, the smallest posterior mean of your “w”s is ~ -6.5, which is much higher than -22.
My first guess: I suspect that 1000 samples and 1000 for tuning need to be increased since the prior seems far off from the posterior (as the errors are suggesting). This aligns with the fact that using -2 as your prior yields better results since it’s closer to the posterior means of your “w” parameters. Let me know if this helps (or not so much)!