Unique solution for probabilistic PCA

@ferrine

All of the model parameters are basically correct in the Minibatch case, except for the fact that each parameter is now calculated based on a distribution of observations (as sampled in the minibatches).

In a sense, this seems similar to if each row in my full dataset was actually an average of some randomly sampled rows.

The main problem is that the correlation matrix is not estimated properly.

Here’s a comparison of the corr matrix error between full dataset and minibatch:
basically, the minibatch model (right side) underestimates most of the correlations (because they’re all essentially set to zero)
image

The only thing I can think of is that the parameters “s” and “l_std” are estimated differently between the two models:
full set
image

Minibatch
image