Modelling multivariate distributions with a sparse covariance structure


I am currently building a hierarchical bayesian regression model that generates separate estimates for multiple dependent variables (I call them kernels, and there are 2000 separate kernels to be exact). Currently, each of these kernels is being modeled independently using a separate hierarchical model, i.e. each kernel is related to the independent prior observations according to a hierarchical bayesian structure.

However, I now require to also model the observed covariances between these kernel distributions at any point (as if they belong to a multivariate normal with a dependent covariance structure). Given that there are a total of 2000 kernels, I would imagine that modeling a 2000x2000 covariance matrix would be very costly. On the other hand, from manual observations, I am able to tell that these kernels are mostly independent and the covariance structure would hence be very sparse.

I was wondering if there is any way to use the multivariate normal distribution from PyMC to appropriately estimate a sparse covariance structure.

Also to give you more detail, I am using ADVI to enable faster fits of these 2000 kernels, so any suggestions should also most likely be compatible with ADVI.

I wasn’t able to find any documentation on modeling sparse covariances for multivariate normals and any suggestions/assistance would be much appreciated.


There’s a few options here:

  1. Do you have any known grouping structure to this data? For example, if you get 2000 dimensions from 10 variables X 40 timesteps X 5 spatial locations, you could use a Kronecker structured Normal model to represent the 2000-dimensional covariance in terms of the Kronecker product of 10x10, 40x40 and 5x5 matrices.

  2. Is a low rank assumption for the covariance okay? Or do you want to get actual zeros (or close to zero) values for the posterior off-diagonal elements? If the former, you can try something like this:

# m is the rank of the resulting covariance matrix
with pm.Model() as model:
    sd = pm.HalfCauchy('sd', 2.5)
    low_rank = pm.Normal('low_rank', mu=0, sd=sd, shape=(p, m))
    cov = pm.Deterministic('cov',, low_rank.T))

    # Sample from multivariate normal
    Y = pm.MvNormal('Y', mu=np.zeros(p), cov=cov, shape=(n, p))
    trace_low_rank = pm.sample(1000)

Here’s a Colab notebook with a comparison between LKJ and low rank covariance priors.

1 Like

Thanks for your helpful suggestions. Here is some extra information regarding your comments:

Not really, these 2000 kernels are in fact a dimensionality-reduced subspace of a higher dimensional feature set (90K feature dimensions that are not independent).

Interesting question, I don’t think I want to enforce zeros, I just thought that would simplify the complexity.

Is there a way to test if a low-rank covariance structure exists in the data?

I have extracted some plots to visualize the sparsity of the covariance.

Here is the 2000x2000 correlation matrix across kernels (absolute values of Pearson’s correlation are plotted):


and this is the histogram of the upper triangle (notice the y-axis is logarithmic):


To me, these indicate that the majority of the correlations are close to zero and that’s why I expected a sparse model would help.

Is there a way to test if the low-rank assumption holds?

P.S. Given that there is a strong diagonal, is that suggesting the low-rank assumption does not hold?