How to model correlation matrices?

I have a set of correlation matrices, I am trying to use Sigma=WLW^T decomposition to decompose Sigma which is the correlation matrix. My goal is to estimate W and L. I was thinking of using LKJCholeskyCov distribution to model Sigma, but how would I give multiple Sigma as an input to distribution as an observed variable? Is it possible to do this type of Bayesian analysis inside this framework?

1 Like

The representation \Sigma=WLW^T for some diagonal L and triangular W is used under the hood IIRC for the LKJ distribution, so that should be okay. Do you have multiple observed correlation matrices, so \Sigma is known but W, L are unknown? Or, are you wishing to have multivariate normal data with an LKJ prior where the MVN vectors are observed?

I have multiple observed correlation matrices so \Sigma is known, I am trying to estimate W and L for each correlation matrices with elements of W following spike and slab prior. I would like to put some priors on L as well but I am not sure if that is possible under the current framework.

There may be issues regarding identifiability and uniqueness of solutions that require careful thought. Also, do you actually need to have L in the decomposition WLW^T? Usually, here L is diagonal and represents the scalar variances which are going to be equal to 1 since you said these are correlation matrices and not covariance matrices.

You could always just put a prior on some sequence of random variables, reshape it into a lower triangular matrix A and then take its transpose AA^T. If you are careful with your prior so that you make certain all the eigenvalues of AA^T are positive, then it should work. However, unless you have some special insight or constraints regarding the usage of the spike and slab prior on A, I would be cautious in just blinding putting priors on everything.

1 Like

There is a notebook on probabilistic PCA that shows how to do this:

@Dushyant_Sahoo are you certain that you want a decomposition of \Sigma into lower-triangular (as opposed to the more common orthogonal decomposition (“PCA”))? The “spike and slab prior” really makes me think you want some Bayesan analog of sparse PCA.

Obviously, for a sparse W, the lower triangular representation \hat W where \hat W \hat W^T = W W^T need not itself be sparse, so placing sparsity-inducing priors on the lower-triangular representation is not particularly well-motivated. At the same time, placing sparse priors on W need not yield an identifiable model (indeed, permutation invariance is untouched by a sparse prior). I have only seen implementations that directly enforce (i.e. by permuting, at each sampling, so that the “most sparse” row is the first row, &c) a constraint – an approach which is incompatible with NUTS.

1 Like

I apologize for the confusion, I should have written the question in detail. In the matrix version of the problem \Sigma^i = W \Lambda^i W^\top, where i goes from 1 to n, W is a shared space which is not necessarily orthogonal or lower diagonal and also sparse which helps with interpretability, and \Lambda^i is a diagonal matrix. In this case, I put a constraint on \Lambda^i such that its trace is 1 which helps with the identifiability. I get point estimates by solving a minimization problem, but I am looking for a Bayesian problem where I can use prior knowledge. Thanks for the discussion.

Not really as permutation is an issue. Specifically even with a shared W, the solution to your system is not unique, since if W is a solution, W\Omega is also a solution for \Omega an orthonormal matrix so that W\Omega\Lambda\Omega^TW^T = W\Omega\Omega^T\Lambda W^T = W\Lambda W^T. The “lower triangular W (with increasing diagonal)” is a common way to force a specific representative for the space of solutions.

You can take W to be arbitrary and just draw from a sparse prior with shape=(m,k); but you will run into identifiability problems which as far as I know cannot be trivially solved while maintaining W as an arbitrary (m,k) sparse matrix.

1 Like

Yes, you are right. I guess in my case I will be ok with getting any solution to W, but probably I won’t be able to achieve that through sampling. If I take the optimization route using gradient descent, then starting from an initial point will return me a solution. I guess if I start from any initial point, the solution will be just permutations of each other. In this case, if I use ADVI, I should be ok, right?

ADVI might do it. Another possibility is to break the permutation symmetry on the prior level by regularizing each row of W differently; for instance letting W_i \sim \mathrm{Lap}(\lambda_i, \mathrm{shape}=(1,k)) with \lambda_i > \lambda_j where i > j. This would make the first row the “least sparse”. It’s not a guarantee, but at least it “prefers” an ordering in terms of row sparsity. Sparsity itself would break the rotational symmetry (as nearly all rotations from a sparse basis will result in a non-sparse matrix). Again, these are not guarantees, but should help in practice.

Good luck!

Thanks! That is really helpful!