Ideas for reparameterizing models/changing priors to avoid divergences

Yeah I think so, because the baseline is actually the average effect across clusters. Then you add the deviations, that are specific effects within clusters