LKJCholeskyCov shape argument

Hi there,

I’m attempting to implement a model wherein I estimate a covariance matrix, using the LKJCovarianceCov prior, for each participant. For other distributions (e.g., Beta), I can use the ‘shape=’ argument to estimate multiple parameters at a time (i.e., one for each participant in my study). However, the ‘shape=’ argument seems to have no affect on the output for the LKJCovarianceCov prior (i.e., no matter the shape argument, the output is always one covariance matrix). I’m relatively new to this software, so any help would be most appreciated.

Best,
pmnewm01

1 Like

AFAIK that’s as per the design: LKJCholeskyCov can only handle a single covariance matrix (2D, but not 3D).

Also having a different prior covariance matrix for each observation strikes me as having perhaps too many degrees of freedom in the model: like giving each observation it’s own intercept. Would make it hard to generalise (learn) much about the data that way. What’s your model structure such that you need that much freedom?

EDIT for clarity: you can certainly have a different cov matrix per observation, but I dont immediately see where you would need a different prior for each observation: the typical thing would be to give them all the same prior.

Each participant has 60 data points. I need to estimate a covariance matrix for each participant. Each data point is a 2d vector with(x,y) position.

Righto - but do they need different priors? (which is what LKJCholskyCov controls)

For example this snippet would give you a 2D covariance matrix of 2 features for each of n observations, but they have a shared prior

sd_dist = pm.InverseGamma.dist(alpha=101., beta=100.)
chol, corr_, stds_ = pm.LKJCholeskyCov('lkjcc', n=2, eta=2., 
                                       sd_dist=sd_dist, compute_corr=True)
                
mvn_dist = pm.MvNormal.dist(mu=tt.zeros(2), chol=chol, shape=(n_obs, 2))
mvn_like = pm.Potential('mvn_like', (mvn_dist.logp(y)))
2 Likes

I’m a little confused. In your snippet, it appears that chol will only be one 2x2 covariance matrix. I’m not exactly sure what MvNormal.dist is doing; more specifically, I’m not sure what the ‘.dist’ component does? And why pm.Potential?

In the GLM hierarchical linear regression example in the documentation, alpha and beta are both distributed as normal dists with shape = n_countries, which returns different alpha and beta values for each country. Why is something similar not done with pm.LKJCholeskyCov as written in your example? I see you’ve added shape=(n_obs, 2) to pm.MvNormal.dist. What does this function return exactly? When I try running this code, it tells me mvn_dist.shape is equal to [n_obs, 2].

Here is the (not working) model to give you a better idea of what I’m trying to do. ‘invP’ is the inverse of a known covariance matrix ‘sigmaP’. ‘invL’ is the inverse of the covariance matrix (one for each participant) that I’m trying to estimate. ‘muij’ is the center of a distribution with variability ‘tau^2’, that is computed using these inverse matrices (invP and invL). ‘targets’ is a n_subjects x n_trials(=60) by 2 array containing (observed) x,y values that were sampled from a multivariate normal with mean=‘muP’ and cov=‘sigmaP’. The subject has to memorize the target’s location and reproduce it from memory. ‘endpts’ is a n_subjects x n_trials by 2 array containing the observed responses that I’m fitting to.

The first issue I don’t know how to tackle is to compute ‘muij’ for each participant and trial. I’m getting an error at this step. Also, as per my previous reply, I still don’t understand how this model is computing a separate ‘chol’ (and ultimately ‘invL’) for each participant. It looks to me like only one covariance matrix is estimated.

with pm.Model() as PiN:

sd_dist = pm.InverseGamma.dist(alpha=1., beta=1)
chol, corr_, stds_ = pm.LKJCholeskyCov('lkjcc', n=2, eta=2., sd_dist=sd_dist, compute_corr=True)
cov = pm.Deterministic('cov', chol.dot(chol.T))

invP = pm.Deterministic('invP', tt.as_tensor_variable(np.linalg.inv(sigmaP)))
invL = pm.Deterministic('invL', pm.math.matrix_inverse(chol))

μij = pm.Deterministic('μij', pm.math.dot(pm.math.matrix_inverse(invP + invL), pm.math.dot(invP, muP) + pm.math.dot(invL, tt.as_tensor_variable(targets[subs_idx]))))

τ = pm.HalfNormal('τ', sd=1)
τ_cov = pm.Deterministic('τ_cov', tt.as_tensor_variable(τ**2 * tt.as_tensor_variable(np.identity(2))))

mvn_dist = pm.MvNormal.Dist(mu=uij, chol=τ_cov, shape=(nsubs, 2))
sij = pm.Potential('sij', (mvn_dis.logp(endpts[1,subs_idx])))

Ah yes, I dived right in there :slight_smile:

The .dist and Potential( logp() ) is a fairly common alternative pymc3 idiom to replace ‘observed=’. I pasted it here from a model I had to hand - in which I found that observed= led to errors. You can likely safely ignore this idiom for now and use the observed= pattern e.g. Multivariate — PyMC 5.10.0 documentation

More generally in my example, I have a different MvNormal for each of nobs, and where they share a common, prior, Cholesky-decomposed covariance matrix.

If you wanted to pick a normally distributed value of alpha and also of beta, per observation, then this structure would give that. And the LKJCholskyCov prior would ensure that these values covary.

Ah okay, interesting - so does your data look like? :

participant_id trial_id x_val y_val
0 0 2 3
0 1 2.5 3.5
0 2 2 3.2
1 0 7 3
1 1 6 4

(… skipping rows for trial_id → 59)

That indeed is not what the examples assume: they assume a single prior covariance matrix for the entire set of observations.

Can I assume that you want to calculate a prior covariance matrix for each participant, created from their sequence of trials? If so, you’d probably need to do some looping so as to get multiple LKJCholeskyCovs into your model.

But then again, are really you sure you want to estimate a different prior covariance matrix per participant? This would imply that Alice has a uniquely different covariance to Bob, and it wouldn’t let you generalise anything about these people such that you could make predictions about Carol. etc

1 Like

Thanks for you help, Jonsedar. I understand the .dist and Potential(log(p)) idioms now.

Yes, your depiction of the dataset is accurate. Part of my goal with this model is to examine individual variation in the covariance matrix. Thus, I want to estimate the covariance matrix (chol) for each individual participant. Will I need to loop over participants for this using something like:

for i in range(n_subs):
chol, corr_, stds_ = pm.LKJCholeskyCov(‘lkjcc_{}’.format(i), n=2, eta=2., sd_dist=sd_dist, compute_corr=True)

Although not a particularly elegant solution, it would probably work?

The other main question I have pertains to the calculation of ‘uij’. Assuming ‘invL’ is 2 by 2 and ‘targets’ is 2 dims by 60 trials dims per participant, how exactly should I write the line of code for this variable such that it computes ‘uij’ for each trial given ‘i’? I’m not so confident in my understanding of how PyMC3 handles indexes (i.e., targets[subs_idx]; where [subs_idx] is a flattened array for unique participant_ids with length num_participants * num_trials).

A quick crossref to the ever knowledgeable @junpenglao would suggest that yes: it’s common practice to loop the LKJ priors LKJCholeskyCov input dependent

I hear you on the not quite elegant bit, and to jump on from Junpeng’s note, with your participants x trials dataset, I hazard that you might be able to treat this as a pm.MatrixNormal, though TBH I’m not familiar with them in practice. I found a good explanation here: Matrix Variate Normal Distributions with MixMatrix

Re uij, if you’re setting a different chol per participant (is this i ?), then wouldn’t;t you have a uj? FWIW you can probably index 2D if you want to u[i][j], but that’ll probably be a level of complication you dont need

1 Like

Yes, you are right, uij would just become uj. However, the pm.Deterministic call that I am assigning to ‘uj’ is attempting to multiply ‘invL’ by the entire ‘targets’ 2 by 60 matrix. What I want it to do is compute ‘uj’ for each column (trial) of that 2 by 60 matrix. Any idea how this can be implemented?

Sorry, back after a brief hiatus for project work. Did you get any further with this?

I was able to get the model running in JAGS. Gave up on implementing it in PyMC3. Thanks a lot for your help throughout.

Ah, week I’m glad to hear you got something working! Consider posting the JAGS code here if you like - it might help the next person with your issue. It might be more clear how to translate into pymc3 too :slight_smile:

Cheers