NUTS sampler slowing down on a hierarchical model

This prior specification looks a little weird in the sense that it’s so diffuse:

Otherwise, I don’t see anything grossly out of place. ~1 sample a second doesn’t sound too bad for a nonlinear model over 60k data points. Have you considered trying to run it on a GPU?