Another one to dig up this old thread. I’m a little late to the party, but this may be germane discussion. I thought to identify a unique factorization (I’m looking more generally at a factor analysis `Y_i ~ N(BF_i, D) ; F_i ~ N(0, H)`

) by the following:

- Set loadings to lower-triangular (no more rotation invariance)
- Set diagonal to positive (no more sign invariance)
- Order diagonal (no more permutation invariance)

this would look something like

```
def expand_packed_block_triangular(n, k, packed, diag=None):
# like expand_packed_triangular, but with n > k.
assert n >= k
out = tt.zeros((n, k), dtype=float)
if diag is None:
idxs = np.tril_indices(n, m=k)
out = tt.set_subtensor(out[idxs], packed)
else:
idxs = np.tril_indices(n, k=-1, m=k)
out = tt.set_subtensor(out[idxs], packed)
idxs = (np.arange(k), np.arange(k))
out = tt.set_subtensor(out[idxs], diag)
return out
ZeroTruncNormal = pm.Bound(pm.Normal, lower=0.0)
# using `obs` data from this thread
c=5
n=N
k=2
with pm.Model() as bfa:
# factors are normally i.i.d.
F = pm.Normal('F', 0., 1., shape=(k, n)) # latent vars
# loadings are right lower triangular with a diagonal
Wod = pm.Normal('W_packed_od', 0., 10., shape=(k*c - k*(k-1)/2 - k,))
Wd = ZeroTruncNormal('W_packed_d', mu=1., sd=3., shape=(k,),
transform=pm.distributions.transforms.Ordered(),
testval=1. + 1 * np.arange(k))
W = pm.Deterministic('W', expand_packed_block_triangular(c, k, Wod, Wd))
L = pm.Deterministic('L', tt.dot(W, F))
d = pm.HalfCauchy('err_d', 5.)
Capprox = pm.Normal('Capprox', mu=L, sd=d, shape=(c,n)) # far faster
traces = [pm.sample(200, chains=1, init='adapt_diag') for _ in xrange(6)]
```

I’m finding that it doesn’t quite work right, with NUTS resulting in over-broad loadings which result in switches:

```
import seaborn as sbn
colors = ['blue', 'red', 'green', 'yellow', 'black', 'magenta', 'orange']
for tr in traces:
for j in xrange(2):
sbn.kdeplot(tr['W_packed_d'][:, j], color=colors[j])
```

and ADVI routinely giving `inf`

values for the loss, and generating broad estimates well into the infeasible model regions

So, are there better ways of modeling these constraints? Or is the symmetric approach far superior?