Unique solution for probabilistic PCA

I guess this effect can be explained in the following way:

  • you say your posterior has 2 modes
  • you actually sample from one mode
  • in KL both modes are covered
  • you do not care what mode you are sampling from

Considering the minibatch dataset, did you specify total_size for your observed variable?