@aseyboldt forLKJCholeskyCov’s sd_dist, I’ve tried pm.Gamma.dist(alpha=1.5,beta=3.0,shape=3) and pm.InverseGamma.dist(alpha=2.0,beta=0.5,shape=3) and they seem to give similar answers as when I was using pm.Exponential(1,shape=3). I don’t think the NUTS sampler is significantly faster for any of these.
Just to be clear, are you saying that I also shouldn’t use my pm.Uniform('mu_x',lower=0.01,upper=0.99) priors for the 3 means of pm.MvNormal because that might be slowing down NUTS (if it hits into the sharp boundaries of the Uniform prior)? What would a uniform-like well-behaved prior be for the means of MvNormal then? My mu_x and mu_y have no physical significance outside of 0-1, and mu_z has no meaning outside of -1 to 1. Would a pm.Beta(alpha=1,beta=1) be better behaved vs. pm.Uniform for the first 2 means, and keep pm.Uniform(lower=0,upper=1) for mu_z?
@ricardoV94 would InverseGamma(alpha=2,beta=0.5) also lead to divergences because its probability goes to 0 very fast at x<<0.1? If so, I thought InverseGamma was the usual (conjugate) prior to use for the standard deviations of a normal rather than Gamma or Exponential.
I am actually leaning towards keeping my original sd_dist=pm.Exponential(1,shape=3) because it also has the advantage that it pushes the standard deviations towards low values unless the data prefer otherwise, which reduces the variance of the model around the mean and maybe makes it easier for the model to draw covariance matrices from the LKJ prior and converge faster? Thoughts?
I would be happy to help update the docs after I understand this better.