Hi, I’m working on an irt problem with 2 dimensions and I’d like to force those dimensions to be orthogonal. My first attempt at this is to use LKJCorr with eta=big number and this does produce the desired behavior but I am curious if there is a smarter way to go about this with pymc. Would appreciate any suggestions, Thanks!
Don’t quite understand the context of your problem but do you want something like this (in this case they are orthonormal but could just make them orthogonal too)?
import pymc as pm
import numpy as np
with pm.Model():
vec = pm.MvNormal("mv", 0, [[1,0],[0,1]], size=2)
X = vec[0,:]
Y = vec[1,:]
normX = pm.math.sqrt((X**2).sum())
Z = X/normX
W = Y-(Y*Z).sum()*Z
normW = pm.math.sqrt((W**2).sum())
W = W/normW
print((W*W).sum().eval())
print((Z*Z).sum().eval())
print((W*Z).sum().eval())
What do you mean by “orthogonal”? If you set eta to a large number on LKJCholeskyCov, you’re just making a diagonal matrix, which you can just do. But an MvNormal with diagonal covariance is equivalent to independent normals, so you don’t even need an MvNormal.
Thanks for the responses here’s some additional detail I should have included.
I’d like to get 2 vectors of latent traits whose posteriors are uncorrelated. I have n_respondent people each with 2 latent traits, and a set of items broken into 2 groups. By assumption trait 1 influences all items but trait 2 influences only a subset. With this structure the posterior mean correlation between factors tends to be pretty close to zero with independent normal priors or a MvNormal prior with any value of eta, but depending on the amount of data I’m fitting to there can be a fair amount of posterior density on nonzero correlations, which I want to eliminate. Here’s what that looks like with MvNormal and eta=1
By increasing eta I can concentrate the posterior more to 0 as desired, but I’m not sure if there is a better way to accomplish the same thing.
If all you want is zero correlation between effects, then model each effect independently
I can give the latent vectors independent normal priors, but this doesn’t do what I’m looking for, which is that the vectors are defined in such a way that each individual realization of vector 1 is orthogonal to vector 2. It looks like this could be done by generating independent normals then applying a qr decomposition Random Orthogonal Matrices - #7 by simonbyrne - Optimization (Mathematical) - Julia Programming Language. I’ll give that a try, thanks
You can do a QR decomposition on anything and the columns of Q will be orthogonal. But this will no longer be independent normally distributed. All the location and scale information about the original distribution will be in R.
I have to think about it a bit more but this may be fine. Since these are latent constructs the location and scale information is arbitrary (really only care about relative differences between individuals), and in the IRT setup there are item slopes and intercepts which translate the location and scale information of the latent factors to the location and scale of the observed data. I might need to modify the priors for those slopes and intercepts to make sure they still makes sense with QRed factors though.
If you take a matrix of standard normal draws and apply a QR decomposition, the distribution induced over Q will be uniform over all orthogonal matrices. You can get both rotations (determinant = 1) or reflections (determinant = -1), so if you really want to stick to the special orthogonal manifold of just rotations, you have to somehow eliminate the determinant = -1 alternatives.
If you then try to impose a different distribution on top of that uniform, you run into two problems. First, you have to do a change of variables adjustment before you can apply the different distribution, and second, you induce a lot of constraint geometry and dependence through the orthonormal constraint (e.g., if one component gets bigger, the sum of the squares of the other ones has to go down by an equal amount, inducing a lot of correlation).
Alternatively, if you start with a skew symmetric matrix with random normal entries and put that through a matrix exponential, you also get uniformly distributed orthonormal matrices. But it has the same problems and I believe is not as stable arithmetically as QR.
Thanks for that info Bob. It looks like in that julia forum answer I linked which is elaborated on a bit in this paper https://arxiv.org/pdf/math-ph/0609050 they handle the determinant issue by multiplying Q by the sign of the diagonal of R. I gave this a try but pymc says “ValueError: Model can not be sampled with NUTS alone. Your model is probably not continuous.” which I guess is due to the sign function. In any case, I think I will stick with lkj with a large eta for now since this does not seem to have any issue sampling with nuts. Thanks all!
You may need the last version on PyTensor/PyMC, we only implemented the gradients for QR recently
Gotcha ok yes I was a couple months behind and with the latest version it samples. The QR approach using the method described in the paper in my last post to handle the determinant appears to sample a bit better than plain independent normal or multivariate normal priors. Thank you!