Thanks for your helpful suggestions. Here is some extra information regarding your comments:
Not really, these 2000 kernels are in fact a dimensionality-reduced subspace of a higher dimensional feature set (90K feature dimensions that are not independent).
Interesting question, I don’t think I want to enforce zeros, I just thought that would simplify the complexity.
Is there a way to test if a low-rank covariance structure exists in the data?
I have extracted some plots to visualize the sparsity of the covariance.
Here is the 2000x2000 correlation matrix across kernels (absolute values of Pearson’s correlation are plotted):

and this is the histogram of the upper triangle (notice the y-axis is logarithmic):

To me, these indicate that the majority of the correlations are close to zero and that’s why I expected a sparse model would help.
Is there a way to test if the low-rank assumption holds?