Unique solution for probabilistic PCA

Another one to dig up this old thread. I’m a little late to the party, but this may be germane discussion. I thought to identify a unique factorization (I’m looking more generally at a factor analysis Y_i ~ N(BF_i, D) ; F_i ~ N(0, H)) by the following:

  1. Set loadings to lower-triangular (no more rotation invariance)
  2. Set diagonal to positive (no more sign invariance)
  3. Order diagonal (no more permutation invariance)

this would look something like

def expand_packed_block_triangular(n, k, packed, diag=None):
    # like expand_packed_triangular, but with n > k.
    assert n >= k
    out = tt.zeros((n, k), dtype=float)
    if diag is None:
        idxs = np.tril_indices(n, m=k)
        out = tt.set_subtensor(out[idxs], packed)
    else:
        idxs = np.tril_indices(n, k=-1, m=k)
        out = tt.set_subtensor(out[idxs], packed)
        idxs = (np.arange(k), np.arange(k))
        out = tt.set_subtensor(out[idxs], diag)
    return out

ZeroTruncNormal = pm.Bound(pm.Normal, lower=0.0)

# using `obs` data from this thread

c=5
n=N
k=2
with pm.Model() as bfa:
    # factors are normally i.i.d.
    F = pm.Normal('F', 0., 1., shape=(k, n)) # latent vars
    # loadings are right lower triangular with a diagonal
    Wod = pm.Normal('W_packed_od', 0., 10., shape=(k*c - k*(k-1)/2 - k,))
    Wd = ZeroTruncNormal('W_packed_d', mu=1., sd=3., shape=(k,), 
                         transform=pm.distributions.transforms.Ordered(),
                        testval=1. + 1 * np.arange(k))
    W = pm.Deterministic('W', expand_packed_block_triangular(c, k, Wod, Wd))
    L = pm.Deterministic('L', tt.dot(W, F))
    d = pm.HalfCauchy('err_d', 5.)
    Capprox = pm.Normal('Capprox', mu=L, sd=d, shape=(c,n))  # far faster
    
    traces = [pm.sample(200, chains=1, init='adapt_diag') for _ in xrange(6)]

I’m finding that it doesn’t quite work right, with NUTS resulting in over-broad loadings which result in switches:

import seaborn as sbn
colors = ['blue', 'red', 'green', 'yellow', 'black', 'magenta', 'orange']

for tr in traces:
    for j in xrange(2):
        sbn.kdeplot(tr['W_packed_d'][:, j], color=colors[j])
        

image

and ADVI routinely giving inf values for the loss, and generating broad estimates well into the infeasible model regions

image

So, are there better ways of modeling these constraints? Or is the symmetric approach far superior?