How to model correlation matrices?

Dushyant_Sahoo · March 25, 2021, 7:56am

I have a set of correlation matrices, I am trying to use Sigma=WLW^T decomposition to decompose Sigma which is the correlation matrix. My goal is to estimate W and L. I was thinking of using LKJCholeskyCov distribution to model Sigma, but how would I give multiple Sigma as an input to distribution as an observed variable? Is it possible to do this type of Bayesian analysis inside this framework?

ckrapu · March 26, 2021, 3:11am

The representation \Sigma=WLW^T for some diagonal L and triangular W is used under the hood IIRC for the LKJ distribution, so that should be okay. Do you have multiple observed correlation matrices, so \Sigma is known but W, L are unknown? Or, are you wishing to have multivariate normal data with an LKJ prior where the MVN vectors are observed?

Dushyant_Sahoo · March 26, 2021, 6:08am

I have multiple observed correlation matrices so \Sigma is known, I am trying to estimate W and L for each correlation matrices with elements of W following spike and slab prior. I would like to put some priors on L as well but I am not sure if that is possible under the current framework.

ckrapu · March 26, 2021, 6:26pm

There may be issues regarding identifiability and uniqueness of solutions that require careful thought. Also, do you actually need to have L in the decomposition WLW^T? Usually, here L is diagonal and represents the scalar variances which are going to be equal to 1 since you said these are correlation matrices and not covariance matrices.

You could always just put a prior on some sequence of random variables, reshape it into a lower triangular matrix A and then take its transpose AA^T. If you are careful with your prior so that you make certain all the eigenvalues of AA^T are positive, then it should work. However, unless you have some special insight or constraints regarding the usage of the spike and slab prior on A, I would be cautious in just blinding putting priors on everything.

chartl · March 26, 2021, 7:10pm

There is a notebook on probabilistic PCA that shows how to do this:

https://docs.pymc.io/notebooks/factor_analysis.html

@Dushyant_Sahoo are you certain that you want a decomposition of \Sigma into lower-triangular (as opposed to the more common orthogonal decomposition (“PCA”))? The “spike and slab prior” really makes me think you want some Bayesan analog of sparse PCA.

Obviously, for a sparse W, the lower triangular representation \hat W where \hat W \hat W^T = W W^T need not itself be sparse, so placing sparsity-inducing priors on the lower-triangular representation is not particularly well-motivated. At the same time, placing sparse priors on W need not yield an identifiable model (indeed, permutation invariance is untouched by a sparse prior). I have only seen implementations that directly enforce (i.e. by permuting, at each sampling, so that the “most sparse” row is the first row, &c) a constraint – an approach which is incompatible with NUTS.

Dushyant_Sahoo · March 26, 2021, 9:14pm

I apologize for the confusion, I should have written the question in detail. In the matrix version of the problem \Sigma^i = W \Lambda^i W^\top, where i goes from 1 to n, W is a shared space which is not necessarily orthogonal or lower diagonal and also sparse which helps with interpretability, and \Lambda^i is a diagonal matrix. In this case, I put a constraint on \Lambda^i such that its trace is 1 which helps with the identifiability. I get point estimates by solving a minimization problem, but I am looking for a Bayesian problem where I can use prior knowledge. Thanks for the discussion.

chartl · March 26, 2021, 10:01pm

Not really as permutation is an issue. Specifically even with a shared W, the solution to your system is not unique, since if W is a solution, W\Omega is also a solution for \Omega an orthonormal matrix so that W\Omega\Lambda\Omega^TW^T = W\Omega\Omega^T\Lambda W^T = W\Lambda W^T. The “lower triangular W (with increasing diagonal)” is a common way to force a specific representative for the space of solutions.

You can take W to be arbitrary and just draw from a sparse prior with shape=(m,k); but you will run into identifiability problems which as far as I know cannot be trivially solved while maintaining W as an arbitrary (m,k) sparse matrix.

Dushyant_Sahoo · March 26, 2021, 10:31pm

Yes, you are right. I guess in my case I will be ok with getting any solution to W, but probably I won’t be able to achieve that through sampling. If I take the optimization route using gradient descent, then starting from an initial point will return me a solution. I guess if I start from any initial point, the solution will be just permutations of each other. In this case, if I use ADVI, I should be ok, right?

chartl · March 27, 2021, 1:38am

ADVI might do it. Another possibility is to break the permutation symmetry on the prior level by regularizing each row of W differently; for instance letting W_i \sim \mathrm{Lap}(\lambda_i, \mathrm{shape}=(1,k)) with \lambda_i > \lambda_j where i > j. This would make the first row the “least sparse”. It’s not a guarantee, but at least it “prefers” an ordering in terms of row sparsity. Sparsity itself would break the rotational symmetry (as nearly all rotations from a sparse basis will result in a non-sparse matrix). Again, these are not guarantees, but should help in practice.

Good luck!

Dushyant_Sahoo · March 27, 2021, 3:50am

Thanks! That is really helpful!

Topic		Replies	Views
Drawing a Cholesky-decomposed correlation matrix Questions	7	631	March 30, 2021
Cholesky decomposition and correlation among random effects Questions	20	5573	March 11, 2019
Using LKJ priors Questions	3	540	December 14, 2020
Uses of LKJCholeskyCov and LKJCorr Questions	2	2274	July 26, 2019
Correlated slopes in multivariate model Questions	9	3459	July 14, 2018

How to model correlation matrices?

Related topics