Dan Simpson et al.'s paper is great. Here’s the final version (we’re linking a draft in the Stan priors recommendations).
The basic idea is that you have a simple model and a complex model and you scale them both to constant variance then interpolate between them and rescale for the actual variance. The problem for me is figuring out the math to get the scaling of the base distributions right.
I hadn’t realized someone had worked out the negative binomial vs. Poisson. I always just penalize the over dispersion parameter, but the negative binomial is hard to fit because there are two ways to explain high values: higher mean or more dispersion. That leads to a lot of correlation in the posteriors, even with a strong shrinkage prior.