LKJCholeskyCov maximum dimension

What is the maximum size of covariance matrix that LKJCholeskyCov can handle ?
I only see examples with small dimensions (2 or 3). Can it be used efficiently for 10 dim or 30 dim ? larger ?

I am new to this package…


It stores the lower triangular part of the matrix so its memory usage scales like (n+1)n/2, where n is the number of dimensions. That is o(n^2), which isn’t so much, but can add up to a lot when n~1000. 30 should be fine though.

What is the highest dimension you modeled ? Was it efficient ? Also, what is the relationship between the sample size and the dimension ?

Obviously, the more data you have, the better it is. How do you know if you have enough data for the dimension of the problem at hand ?

Thanks a lot!

These are all great questions.

I have not used LKJCholeskyCov at work yet. I have used it with some toy examples which did not go beyond 10 dimensions.

As I said, I did not use it for work purposes. However, there are some post talking about speed gains of using the Cholesky decomposition instead of the regular covariance matrix. I recall @twiecki looked at this in detail in the past.

By that, do you mean effective sample size? Regrettably, ESS depends on the rest of the model and whatever you were observing. For multivariate normals, using LKJCholeskyCov there should be little penalty in increasing the dimensionality thanks to the HMC steps, because these scale better in high dimensions. There’s a paper that proves that you need of the order of O(D^{1/4}) samples to get a good converged chain with HMC for well behaved problems.

I’m not sure about this point. The parameter state space is affected by the amount of data, but mostly by how the data changes the curvature of the model’s parameter space. If your assumed generative process is badly specified, no amount of data will help you converge better. I don’t know of an optimal bound, but I think that the O(D^{1/4}) should roughly hold also for the amount of data inasmuch as the model is well behaved.

The example you linked to (my old question) is a special case where I was fitting a multivariate normal with a special covariance matrix. To my knowledge, the trick that occurs in that model is possible only when the covariance matrix is the weighted sum of two known covariance matrices which never change. I’m not sure its relevant to your question.