Covariate matrices as observations

Dear All,

I would like to produce random covariance matrices in order to make pairwise comparison between them. I am looking for chances of accidental similarity. These random matrices would be inspired by experimental covariance matrices, so I would like to infer the joint sd and eta parameters and make posterior predictions along the lines shown below.

Problem 1:
I do not know how to feed multiple observed matrices to LKJCholeskyCov distribution, I get a dimension error when I trying to use more than one packed Ls.
Problem 2:
The LKJCholeskyCov distribution does not have a random function therefore the sample_ppc does not work. This would not be problem if sd is recorded in the trace, but I do not know how to achieve that in this example.

I wonder if there are solutions or workarounds for these problems. Does it make sense what I am trying to do?

Gergely

with pm.Model() as model:
eta=pm.HalfCauchy(‘eta’,beta=2.5)
sd=pm.HalfCauchy.dist(beta=2.5, shape=3)
obs=np.array(df[[‘oddU11’,‘oddU12’,‘oddU22’,‘oddU13’,‘oddU23’,‘oddU33’]])[0]
packed_L = pm.LKJCholeskyCov(‘packed_L’, n=3, eta=eta, sd_dist=sd, observed=obs)

with model:
trace = pm.sample()

with model:
pm.sample_ppc(trace)

obs looks something like this:

array([[ 9.860e-02, -1.310e-02, 9.990e-02, 1.850e-02, -1.190e-02,
1.091e-01],
[ 9.020e-02, -1.310e-02, 1.003e-01, 2.340e-02, -1.660e-02,
1.109e-01],
[ 1.069e-01, -2.400e-03, 8.370e-02, 2.360e-02, -8.800e-03,
1.154e-01],
…,
[ 8.000e-04, 0.000e+00, 2.400e-03, 1.000e-04, -1.000e-04,
1.400e-03],
[ 7.000e-04, 0.000e+00, 2.200e-03, 3.000e-04, 0.000e+00,
1.200e-03],
[ 1.100e-03, 0.000e+00, 1.800e-03, -1.000e-04, 2.000e-04,
1.700e-03]])

Have you considered using pm.Wishart instead? This is a classical distribution over psd (covariance) matrices, with an implemented random so it should be alright.

Also (for general interest) there’s recent work showing an equivalence between LKJ and a restricted form of the Wishart prior (it also gives a more efficient random for LKJ other than the onion method):

1 Like

Thanks this is a great suggestion, because pm.Wishart indeed has a random function. I am not familiar with how to use it, but I gave it a try. I still have problem defining multiple structured, higher dimensional observations. MVNormal observations are fine, but I cannot stack multiple packed L and covariance matrices in the same way (along the first axis). Is there a keyword defining along which axis the observations are stacked? The code below runs up until the posterior prediction, but I doubt that it performs what my original goal was. Nu looks like a proper MCMC trace, but I guess V has to be another positive definite random matrix with a prior (another pm.Wishart?). I get an assertion error if I do that.

line1=np.array(df[[‘oddU11’,‘oddU12’,‘oddU13’]])
line2=np.array(df[[‘oddU12’,‘oddU22’,‘oddU23’]])
line3=np.array(df[[‘oddU13’,‘oddU23’,‘oddU33’]])
lendf=len(df)
bigmtx=np.zeros((lendf,3,3))
for i in range(0,lendf):
bigmtx[i]=np.array([line1[i],line2[i],line3[i]])

with pm.Model() as model:
nu = pm.Uniform(‘nu’, 0, 100)
V=bigmtx[0]/3.
#V=pm.Wishart(‘V’,nu=nu,V=bigmtx[0]/3.)
cov=pm.Wishart(‘cov’,nu=nu,V=V, observed=bigmtx[0])

with model:
trace = pm.sample(cores=1)

with model:
pm.sample_posterior_predictive(trace, samples=500)

You are right. There appear to be some missing features for dealing with covariance realizations. You can generate tensors of the right shape easily:

eye = np.eye(5)
nu = 20

with pm.Model() as mod:
    w = pm.Wishart('w', nu=nu, V=eye, shape=(5,5))
    trace = pm.sample_prior_predictive(12)
    
print(trace['w'].shape)

(12, 5, 5)

but inference is broken. WishartBartlett appears not to allow the observed keyword (???) while Wishart breaks if observed is more than 2d. So (note especially the commented [failing] line) you have to rely on a dumb hack:


with pm.Model() as mod2:
    nu_offset = pm.HalfNormal('nu_o', 3)
    nu = pm.Deterministic('nu', 20 + nu_offset)
    V_p = pm.WishartBartlett('V_p', nu=20, S=eye, is_cholesky=True)
    # w2 = pm.Wishart('w2', nu=nu, V=V_p, observed=trace['w'])  ***doesn't work!!***
    for hck in range(12):
        w2 = pm.Wishart('w%d' % hck, nu=nu, V=V_p, shape=(5,5),observed=trace['w'][hck])
    trace2 = pm.sample(500, cores=2, chains=4)

pm.traceplot(trace2, 'V_p_c');

2 Likes

This worked brilliantly on my data, thank you!! Of course, I hoped for a cleaner solution, but having an unlimited supply of random cov matrices definitely worth it. I did a quick comparison of the matrix traces between the original and generated covs Figure_2
Blue and orange are the data and random traces, respectively.
And this is the comparison of one of the off-diagonal elements:
Figure_1-2

The modes are reasonably close, the shapes are somewhat different, but of course the data is not random. It is still much closer than I would ever get by manual tinkering with distribution parameters. Now I just have to understand the role of the offset in nu, it is clearly necessary for NUTS to work. I would never guessed this, thank you for this insight.

Gergely

for a k \times k matrix \Sigma, for \mathrm{Wishart}(\Sigma, \nu) is only defined for \nu > k. I had been playing around with the dimensions so I put an offset in to ensure I wouldn’t inadvertently sample from undefined likelihoods. The “right” way to do this would be with pm.Bound, I think.

I bump this conversion, because there is a subtle issue with the model/data described previously. When I have a non-informative prior for the degrees of freedom, the posterior settles at around 9, but posterior prediction indicates that the off diagonal components vary too much.
Figure_1
This figure shows the histogram of the ratio between the minimum and maximum of the eigenvalues of the data (blue) and predictions (orange).
Figure_2
This figure is the trace of the data and the predictions respectively.
I guess as the d.o.f increases the magnitude of thediagonal increases too which decreases the likelihood, but for the off diagonal elements a higher d.o.f would be desirable. I wonder if there is a variant of the Wishart likelihood where the elements are normalized by the degrees of freedom?