Another one to dig up this old thread. I’m a little late to the party, but this may be germane discussion. I thought to identify a unique factorization (I’m looking more generally at a factor analysis Y_i ~ N(BF_i, D) ; F_i ~ N(0, H)
) by the following:
- Set loadings to lower-triangular (no more rotation invariance)
- Set diagonal to positive (no more sign invariance)
- Order diagonal (no more permutation invariance)
this would look something like
def expand_packed_block_triangular(n, k, packed, diag=None):
# like expand_packed_triangular, but with n > k.
assert n >= k
out = tt.zeros((n, k), dtype=float)
if diag is None:
idxs = np.tril_indices(n, m=k)
out = tt.set_subtensor(out[idxs], packed)
else:
idxs = np.tril_indices(n, k=-1, m=k)
out = tt.set_subtensor(out[idxs], packed)
idxs = (np.arange(k), np.arange(k))
out = tt.set_subtensor(out[idxs], diag)
return out
ZeroTruncNormal = pm.Bound(pm.Normal, lower=0.0)
# using `obs` data from this thread
c=5
n=N
k=2
with pm.Model() as bfa:
# factors are normally i.i.d.
F = pm.Normal('F', 0., 1., shape=(k, n)) # latent vars
# loadings are right lower triangular with a diagonal
Wod = pm.Normal('W_packed_od', 0., 10., shape=(k*c - k*(k-1)/2 - k,))
Wd = ZeroTruncNormal('W_packed_d', mu=1., sd=3., shape=(k,),
transform=pm.distributions.transforms.Ordered(),
testval=1. + 1 * np.arange(k))
W = pm.Deterministic('W', expand_packed_block_triangular(c, k, Wod, Wd))
L = pm.Deterministic('L', tt.dot(W, F))
d = pm.HalfCauchy('err_d', 5.)
Capprox = pm.Normal('Capprox', mu=L, sd=d, shape=(c,n)) # far faster
traces = [pm.sample(200, chains=1, init='adapt_diag') for _ in xrange(6)]
I’m finding that it doesn’t quite work right, with NUTS resulting in over-broad loadings which result in switches:
import seaborn as sbn
colors = ['blue', 'red', 'green', 'yellow', 'black', 'magenta', 'orange']
for tr in traces:
for j in xrange(2):
sbn.kdeplot(tr['W_packed_d'][:, j], color=colors[j])
and ADVI routinely giving inf
values for the loss, and generating broad estimates well into the infeasible model regions
So, are there better ways of modeling these constraints? Or is the symmetric approach far superior?