Proposing some new distributions for pymc_extras

Over the long weekend, I dug into BPCA topic again and implemented Nirwan et al “Rotation Invariant Householder Parameterization for Bayesian PCA”, which basically involved three distributions of potentially wider interest.

  1. A uniform-ish distribution over direction vectors, i.e. a spherically symmetric distribution.
    This basically means putting a prior on the euclidean norm of the vector. There is a longer discussion on what prior to put on it in the stan forums, but I’m thinking of implementing it generically so it can be given as a parameter.
  2. A uniform distribution over all (semi-)orthogonal matrices i.e. Stiefel manifold. This is done following Mezzadri “How to generate random matrices from the classical compact groups” and basically boils down to sampling random rotation vectors and using them as bases for sequential Householder transformations.
  3. A distribution of the singular values of a matrix of i.i.d. standard normal distributions (or, equivalently, the distribution of eigenvalues on Wishart matrices). This one is adapted from James, Lee “Concise Probability Distributions of Eigenvalues of Real-Valued Wishart Matrices”

I have working implementations for all three, although they will definitely need some cleanup and generalization before they can be added.

I have created a Draft PR into pymc-extras, but @zaxtax recommended I ask for comments here as well

My two main questions are:
a) Do all three make sense to add
b) How should I name the distributions

Any thoughts and comments welcome :slight_smile:

  1. You can build this up directly in PyMC out of normal variates. Just generate a standard normal random N-vector y \sim \text{normal}(0, 1), then set x = y / \text{sum}(y). This induces a uniform distribution over unit vectors x. The space is compact, so it’s a proper density. The density from the multivariate normal acts like the Jacobian, though it’s not technically a Jacobian (see: Constraint Transforms in the Stan Reference Manual for more details).

  2. This is tricky because of the geometry of orthonormal matrices. It’s easy enough to generate a uniform distribution over orthonormal matrices, with two main approaches in addition to Givens rotations:

    1. fill up a random matrix with standard normal draws, QR-decompose, and take the Q matrix. This will be uniformly distributed over orthonormal matrices. QR will usually use Householder, so this may be what your reference is doing.
    2. fill up a skew symmetric matrix with standard normal draws, then matrix exponentiate it. This will also be uniformly distributed over orthonormal matrices.

    Either way, the space is compact and you get a proper transformed distribution, just as in the unit vector case. The problem that arises is that there is no proper diffeomorphism because of the disconnect between rotations (determinant = 1) and reflections (determinant = -1). You can fix this with some row swapping if you only want rotations or reflections, but I don’t know what that’ll do to autodiff.

  3. What would you use this for? I know people like to use spectral priors for matrices in some cases, like the LKJ prior for correlation matrices, which penalizes small determinants (and hence correlation). But if that’s all you need to do, it’s easier to go the other way and just build up a matrix from (2) plus a distribution over eigenvalues.

Thank you for the thorough reply @bob-carpenter

You are not wrong, but the problem with this approach is that the origin is a singularity in this geometry, and using standard normals still puts a fair amount of probability mass near it. So the idea is to use a distribution that drops off fast towards zero, like Gamma(50,50) (or InvGamma) instead of Chi(d). The stan thread explains it well, and a younger (and maybe more handsome?) version of you even recommended making it a stancon paper in that thread :slight_smile:

Unfortunately I don’t speak differential geometry, but if I understand you correctly, you are pointing out that there are two kinds of symmetries - rotations and reflections - and that while this handles the first, the geometry is not great about handling the second. Which is very true. The papers I’ve read all basically say “just fix it in postprocessing” and as it is rather straightforward to do for BPPCA I’m inclined to not see it as too much of an issue.

In practice there is also a third issue which is permutational near-symmetries. Namely, I run into the situation where if initial starting point is bad, the last two vectors of the R matrix in QR decomposition switch, especially if the diagonal element is low in absolute value. This is considerably more insidious, as it is a local maximum but not equivalent to the global maximum like the reflection symmetries are. This is less of an issue for the SVD decomposition as in the Nirwan paper, although in that particular paper, the issue is just transferred to the singular value distribution.

Despite the problems, the distribution is still useful as it does at least remove one class of symmetries, so adding it to pymc still feels valuable. Do you agree?

This is a distribution over eigenvalues (or, strictly speaking, over their square roots as currently implemented), so I’m not sure I’m following what you mean with this comment. Can you elaborate?

I have a strong accent :slight_smile:

A square matrix M is orthonormal if multiplying by its own transpose produces an identity, M \cdot M^\top = \textrm{I}. That implies that each row has a unit dot product with itself and a zero dot product with every other row other than itself. The natural transforms with QR decomposition or matrix exponentiation produce the full space of orthonormal matrices.

There are two ways a matrix can be orthonormal—being a rotation or a reflection. Rotations have determinant 1 and reflections have determinant -1. If you swap two rows of matrix (a discontinuous operation), you can convert one into the other. So it’s easy enough to randomly generate either just rotations or just reflections. For example, to generate a random rotation, generate an orthonormal matrix and if it has determinant -1, swap two rows. But it’s harder to model just the rotation matrices as a data type because of the discontinuity between reflections and rotations.

Physical processes often have unknown rotations, e.g., in cryo-EM reconstructions, so it’s hard to model the rotation with a parameter continuously. This is probably related to the swapping of rows.

I don’t really have enough experience here with either cost of maintenance for PyMC or these transforms to be a good judge.

I think it’d be great just to have a transform that let you do orthonormal matrices, but I don’t know how to do that. It’d be even better if you could figure out how to restrict to just rotations (which the literature calls the special orthonormal group and writes as SO(N) in N dimensions, with SO(3) being regular old 3D spatial rotations).

I was surprised that there’s nothing in oryx for that, at least that I could see among their dozens of bisectors. geotorch has an implementations of the general orthonormal case in geotorch.orthogonal:

geotorch’s developer, Mario Lezcano Casado, was really helpful when he visited and I inquired about this particular transform—he was patient enough to derive why the matrix exponential does what it does.

Warning: geotorch.orthogonal works in the transpose so that the columns are unit vectors and it generates orthonormal vectors, not just orthogonal vectors (i.e., the columns are all unit length).