Modelling multivariate distributions with a sparse covariance structure

Thanks for your helpful suggestions. Here is some extra information regarding your comments:

Not really, these 2000 kernels are in fact a dimensionality-reduced subspace of a higher dimensional feature set (90K feature dimensions that are not independent).

Interesting question, I don’t think I want to enforce zeros, I just thought that would simplify the complexity.

Is there a way to test if a low-rank covariance structure exists in the data?

I have extracted some plots to visualize the sparsity of the covariance.

Here is the 2000x2000 correlation matrix across kernels (absolute values of Pearson’s correlation are plotted):

image

and this is the histogram of the upper triangle (notice the y-axis is logarithmic):

image

To me, these indicate that the majority of the correlations are close to zero and that’s why I expected a sparse model would help.

Is there a way to test if the low-rank assumption holds?