Unique solution for probabilistic PCA

Well yes, the joys of Bayesian PCA. This is a well known problem, bPCA’s have no unique solution. Without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching.

If you plan to apply this to real-life data, one approach that I have found successful is to avoid the rotation problem by forcing the factor-matrix to be very sparse. I have used ‘witch hat’ priors to that effect (a mixture of two gaussians, one with very large sigma, one with near-0 sigma). This forces the model to find an ‘optimal’ rotation under which the factor matrix is mostly comprised of 0’s (a desirable property for most analysis).

This doesn’t solve the symmetry and the component identity (label switching) issues, however. The most elegant solution to that is to simply change the signs, and the order of columns manually after the inference so that factor matrices (and scores) match, and re-run convergence tests on those altered matrices.

Finally, if you’re interested in large-scale approximation, I have found some success in using ADVI with the method described in this paper.

Happy to elaborate any of the above or guide towards literature (specific or general) if you have specific questions/cases.

To summarise, your example will and should, by design, not converge.

5 Likes