Improving model convergence and sampling speed

From my personal experience, I usually find hierarchical models tend to sample very slowly with NUTS. A method I found that could help especially with large set of observations is to either approximate with ADVI and mini batch (which can speed up computation by 2 fold) and then sampling, if its good enough, using sample_approx or from that starting point sample using Metropolis or NUTS. My experience with metropolis is that it tends to drift away from that starting point and provide bad estimates of the posterior.

Seeing the following (some methods are outdated):