I’m tinkering around with probabilistic PCA and I’m having all kinds of convergence problems. I didn’t add any constraints to the model to guarantee a unique solution for the factor matrix and so I’m wondering if that causes multimodality or something like that. Does anyone have any experience with this?

Example code is attached, thanks!

PPCA.py (1.4 KB)

# Unique solution for probabilistic PCA

**gBokiau**#2

Well yes, the joys of Bayesian PCA. This is a well known problem, bPCA’s have no unique solution. Without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching.

If you plan to apply this to real-life data, one approach that I have found successful is to avoid the rotation problem by forcing the factor-matrix to be very sparse. I have used ‘witch hat’ priors to that effect (a mixture of two gaussians, one with very large sigma, one with near-0 sigma). This forces the model to find an ‘optimal’ rotation under which the factor matrix is mostly comprised of 0’s (a desirable property for most analysis).

This doesn’t solve the symmetry and the component identity (label switching) issues, however. The most elegant solution to that is to simply change the signs, and the order of columns manually after the inference so that factor matrices (and scores) match, and re-run convergence tests on those altered matrices.

Finally, if you’re interested in large-scale approximation, I have found some success in using ADVI with the method described in this paper.

Happy to elaborate any of the above or guide towards literature (specific or general) if you have specific questions/cases.

To summarise, your example will and should, by design, not converge.

Thanks that’s all very interesting. I had just been exploring Bayesian PCA and Bayesian Factor analysis after readingthis paper on causal inference. So I don’t have any urgent real life application for this but it’s fascinating to learn about.

I’m trying to wrap my mind around this. By manually, do you literally mean manually looking through the traces/chains and determining where label switching occurred, or is there an automated way to do this? Can you point me to where I can read more, or if you’re feeling generous, amend @twiecki’s example to illustrate what you mean? (When I run that example with multiple chains, I get label switching, as you might expect)

Thanks in advance for any guidance you can provide.

**gBokiau**#5

If the factors are sufficiently contrasted, label and sign switching will typically only occur across chains, not inside a single trace. If they’re not, it *will* tend to just be one big mush. So yes, I do mean literally manually matching, reordering and sign-correcting factors in the matrices *across traces* in order to apply convergence tests. Treating them this way implies there is some convergence to begin with, so the formal tests are really mostly of a formality at that point. It does however allow to pinpoint which specific factors (weights) might be more or less stable, which in turn allows one to select those data points that produce more contrasted/telling factors.