Covariate matrices as observations

Dear All,

I would like to produce random covariance matrices in order to make pairwise comparison between them. I am looking for chances of accidental similarity. These random matrices would be inspired by experimental covariance matrices, so I would like to infer the joint sd and eta parameters and make posterior predictions along the lines shown below.

Problem 1:
I do not know how to feed multiple observed matrices to LKJCholeskyCov distribution, I get a dimension error when I trying to use more than one packed Ls.
Problem 2:
The LKJCholeskyCov distribution does not have a random function therefore the sample_ppc does not work. This would not be problem if sd is recorded in the trace, but I do not know how to achieve that in this example.

I wonder if there are solutions or workarounds for these problems. Does it make sense what I am trying to do?


with pm.Model() as model:
sd=pm.HalfCauchy.dist(beta=2.5, shape=3)
packed_L = pm.LKJCholeskyCov(‘packed_L’, n=3, eta=eta, sd_dist=sd, observed=obs)

with model:
trace = pm.sample()

with model:

obs looks something like this:

array([[ 9.860e-02, -1.310e-02, 9.990e-02, 1.850e-02, -1.190e-02,
[ 9.020e-02, -1.310e-02, 1.003e-01, 2.340e-02, -1.660e-02,
[ 1.069e-01, -2.400e-03, 8.370e-02, 2.360e-02, -8.800e-03,
[ 8.000e-04, 0.000e+00, 2.400e-03, 1.000e-04, -1.000e-04,
[ 7.000e-04, 0.000e+00, 2.200e-03, 3.000e-04, 0.000e+00,
[ 1.100e-03, 0.000e+00, 1.800e-03, -1.000e-04, 2.000e-04,

Have you considered using pm.Wishart instead? This is a classical distribution over psd (covariance) matrices, with an implemented random so it should be alright.

Also (for general interest) there’s recent work showing an equivalence between LKJ and a restricted form of the Wishart prior (it also gives a more efficient random for LKJ other than the onion method):

1 Like

Thanks this is a great suggestion, because pm.Wishart indeed has a random function. I am not familiar with how to use it, but I gave it a try. I still have problem defining multiple structured, higher dimensional observations. MVNormal observations are fine, but I cannot stack multiple packed L and covariance matrices in the same way (along the first axis). Is there a keyword defining along which axis the observations are stacked? The code below runs up until the posterior prediction, but I doubt that it performs what my original goal was. Nu looks like a proper MCMC trace, but I guess V has to be another positive definite random matrix with a prior (another pm.Wishart?). I get an assertion error if I do that.

for i in range(0,lendf):

with pm.Model() as model:
nu = pm.Uniform(‘nu’, 0, 100)
cov=pm.Wishart(‘cov’,nu=nu,V=V, observed=bigmtx[0])

with model:
trace = pm.sample(cores=1)

with model:
pm.sample_posterior_predictive(trace, samples=500)

You are right. There appear to be some missing features for dealing with covariance realizations. You can generate tensors of the right shape easily:

eye = np.eye(5)
nu = 20

with pm.Model() as mod:
    w = pm.Wishart('w', nu=nu, V=eye, shape=(5,5))
    trace = pm.sample_prior_predictive(12)

(12, 5, 5)

but inference is broken. WishartBartlett appears not to allow the observed keyword (???) while Wishart breaks if observed is more than 2d. So (note especially the commented [failing] line) you have to rely on a dumb hack:

with pm.Model() as mod2:
    nu_offset = pm.HalfNormal('nu_o', 3)
    nu = pm.Deterministic('nu', 20 + nu_offset)
    V_p = pm.WishartBartlett('V_p', nu=20, S=eye, is_cholesky=True)
    # w2 = pm.Wishart('w2', nu=nu, V=V_p, observed=trace['w'])  ***doesn't work!!***
    for hck in range(12):
        w2 = pm.Wishart('w%d' % hck, nu=nu, V=V_p, shape=(5,5),observed=trace['w'][hck])
    trace2 = pm.sample(500, cores=2, chains=4)

pm.traceplot(trace2, 'V_p_c');


This worked brilliantly on my data, thank you!! Of course, I hoped for a cleaner solution, but having an unlimited supply of random cov matrices definitely worth it. I did a quick comparison of the matrix traces between the original and generated covs Figure_2
Blue and orange are the data and random traces, respectively.
And this is the comparison of one of the off-diagonal elements:

The modes are reasonably close, the shapes are somewhat different, but of course the data is not random. It is still much closer than I would ever get by manual tinkering with distribution parameters. Now I just have to understand the role of the offset in nu, it is clearly necessary for NUTS to work. I would never guessed this, thank you for this insight.


for a k \times k matrix \Sigma, for \mathrm{Wishart}(\Sigma, \nu) is only defined for \nu > k. I had been playing around with the dimensions so I put an offset in to ensure I wouldn’t inadvertently sample from undefined likelihoods. The “right” way to do this would be with pm.Bound, I think.

I bump this conversion, because there is a subtle issue with the model/data described previously. When I have a non-informative prior for the degrees of freedom, the posterior settles at around 9, but posterior prediction indicates that the off diagonal components vary too much.
This figure shows the histogram of the ratio between the minimum and maximum of the eigenvalues of the data (blue) and predictions (orange).
This figure is the trace of the data and the predictions respectively.
I guess as the d.o.f increases the magnitude of thediagonal increases too which decreases the likelihood, but for the off diagonal elements a higher d.o.f would be desirable. I wonder if there is a variant of the Wishart likelihood where the elements are normalized by the degrees of freedom?