Hierarchical model - NUTS - warnings after sampling - reparametrization?

I am trying to get rid of some divergence & max_tree_depth warnings in the models from Chapter 20 of Kruschke (2015). The models contain multiple nominal predictors. (I basically translated JAGS code into PyMC3)

I tried reparametrization of the models, which helped a lot in speeding up NUTS. But there are still issues. Since the original model was defined for JAGS, which uses Slice sampling if I am not mistaken, could the model/priors be suboptimal for NUTS?

Did you try the new initialization for NUTS (if you upgrade to master it use advi+adapt_diag as initialization)? I had a quick try it seems works better.

Having looked at your models, I will comment on one point of potential improvement:
The sigma (ySigma in the first two models and sigma in the last model) might have the best parameterization. In general, a Uniform prior for sigma is not ideal, and a tt.max() would likely break the gradient, maybe try a HalfCachy prior?