Eight school problem with student t distribution for treatment effects

Welcome!

This is expected. Requiring thin tails (large nu) forces the scale to be increased (tau to be decreased) in order to accommodate outlying observations.

As for the poor sampling, it could be many things. It could be the amount of data you have per level/category in the hierarchy (e.g., per school), it could be something about the distribution of data with a level category, it could be the distribution of means/SDs across categories, etc. Have you tried reducing target_accept? That may improve mixing.

As for hierarchical models that use non-normal “random effects” here is one example.