Hey guys, just catching up here with a few notes.
@dycontri your model seems ill-defined. Just to be clear, latent_like is the noise in your model (that which can not be explained by the latent factors), thus s is that noise’s spread (all ok so far). However, you’re missing intercepts, and l_std and w both being multipliers, they can not be uniquely defined. You should do away with l_std, and increase the sigma on w instead. The intercepts fwiw, are not subject to sign flipping, and should not be in the same group (a simple MeanField would do).
Further, because you’re using a bi-dimensional model, not only is it subject to sign-flipping, it is also subject to label switching, with factor 1 & 2 being perfectly interchangeable. Moreover, although modelling the weights as a gaussian mitigates this, it is still possible that a series of orthogonal rotations of the whole weights and factor space might yield suitable parameters for the model. Thus you need a way to uniquely define the factors, and ‘fix’ (render unique) the angles of the orthogonal rotation that will suite the model (nb: it would be helpful to see the resulting mean weights in matrix form, I suspect that one of the factors might simply have all its weights close to 0, ie it “collapsed”). To fix the rotation, it is common (albeit not ideal) to force an increasing number of weights to 0 for each additional latent dimension. This would translate, in your case, to forcing w[0,0] = 0.
@ferrine you seem to be confused by the random generator being the same as for MeanField, but that is exactly the case/point. The complete group of weights and latent factors being bimodal overall, but conditionally unimodal given any one of its point, this symmetrization essentially settles for one of the two solutions, and then scales the logq to render the KL-distance consistent with reality.