I am trying to get rid of some divergence & max_tree_depth warnings in the models from Chapter 20 of Kruschke (2015). The models contain multiple nominal predictors. (I basically translated JAGS code into PyMC3)
I tried reparametrization of the models, which helped a lot in speeding up NUTS. But there are still issues. Since the original model was defined for JAGS, which uses Slice sampling if I am not mistaken, could the model/priors be suboptimal for NUTS?
Did you try the new initialization for NUTS (if you upgrade to master it use advi+adapt_diag
as initialization)? I had a quick try it seems works better.
Having looked at your models, I will comment on one point of potential improvement:
The sigma (ySigma
in the first two models and sigma
in the last model) might have the best parameterization. In general, a Uniform prior for sigma is not ideal, and a tt.max()
would likely break the gradient, maybe try a HalfCachy prior?