In case anyone stumbles on this question:
There is actually a theoretically sound identification that is based on LQ decomposition.
To briefly sum it up: if we have a matrix of weights W and scores S, if W @ S is fixed, both W and S are unidentified up to scale, permutation, rotation and sign flip (as W can compensate for any change in S and vice versa). Putting normal priors on the elements of both gets rid of scale issues, but permutation, rotation and sign flips are still a thing.
However, if you take the LQ factorization of W so W = L @ Q, you end up with L @ Q @ W
Q is orthogonal, so we can just subsume it in S so S’ = Q @ S and if all elements of S are symmetrically distributed, it will not change the distribution.
L is lower triangular, and as D. Leung and M. Drton. Order-invariant prior specification in Bayesian factor analysis prove, if original W was i.i.d normal, then L is also i.i.d normal on the lower triangle and has Chi distribution (they don’t name it, but it is) on the diagonal.
So, with this, identification becomes very easy.
import pymc as pm, pytensor.tensor as pt
import numpy as np
import pymc_extras as pmx
# Output dimension, real dimension, factor dimension, number of samples
od,rd,fd,n = 10,7,5,300
# Create a dataset to test the FA
real_mat = np.random.normal(0,1,size=(rd,od))
real_scores = np.random.normal(0,1,size=(n,rd))
obs = real_scores @ real_mat + np.random.normal(0,0.1,size=(n,od))
with pm.Model() as model:
# Create the W matrix
tu_inds = np.triu_indices(n=fd,m=od,k=1)
wvals = pm.Normal('wvals',size=len(tu_inds[0]))
wdvals = pmx.Chi('wdvals',size=fd,nu=fd-np.arange(fd))
W = pt.zeros((fd,od))
W = pt.set_subtensor(W[tu_inds],wvals)
W = pt.set_subtensor(W[np.arange(fd),np.arange(fd)],wdvals)
pm.Deterministic('W',W)
# Non-marginalized model that directly gives the scores
scores = pm.Normal('scores', mu=0, sigma=1, shape=(n,fd))
sd = pm.HalfNormal('sd',sigma=0.5)
pm.Normal('obs', mu=scores @ W, sigma=sd, shape=(n,od), observed=obs)
# Alt: Marginalized model that leaves scores implicit
#pm.MvNormal('obs', mu=0, cov=W.T @ W + 0.1*pt.eye(od), shape=(n,od), observed=obs)
idata = pm.sample(tune=2000)
And to check if all chains converged to the same solution:
import pandas as pd
from plotnine import *
pvals = np.swapaxes(idata.posterior.W.values,-1,-2)@random_map
# Create a random map into two dimensions
random_map = np.random.normal(0,1,size=(fd,2))
# Reshape pvals into longform dataframe with category columns
df_long = pd.DataFrame([
{
'chain': chain_idx,
'point': point_idx,
'x': pvals[chain_idx, draw_idx, point_idx, 0],
'y': pvals[chain_idx, draw_idx, point_idx, 1]
}
for chain_idx in range(pvals.shape[0])
for draw_idx in range(pvals.shape[1])
for point_idx in range(pvals.shape[2])
])
ggplot(df_long,aes(x='x',y='y',color='factor(point)')) + geom_point(alpha=0.3) + facet_wrap('~chain')
Usually, they do, and the result is: