These are all great questions.
I have not used LKJCholeskyCov
at work yet. I have used it with some toy examples which did not go beyond 10 dimensions.
As I said, I did not use it for work purposes. However, there are some post talking about speed gains of using the Cholesky decomposition instead of the regular covariance matrix. I recall @twiecki looked at this in detail in the past.
By that, do you mean effective sample size? Regrettably, ESS depends on the rest of the model and whatever you were observing. For multivariate normals, using LKJCholeskyCov
there should be little penalty in increasing the dimensionality thanks to the HMC steps, because these scale better in high dimensions. There’s a paper that proves that you need of the order of O(D^{1/4}) samples to get a good converged chain with HMC for well behaved problems.
I’m not sure about this point. The parameter state space is affected by the amount of data, but mostly by how the data changes the curvature of the model’s parameter space. If your assumed generative process is badly specified, no amount of data will help you converge better. I don’t know of an optimal bound, but I think that the O(D^{1/4}) should roughly hold also for the amount of data inasmuch as the model is well behaved.