This is pretty much outside of my knowledge so I will just provide some comments:
-
If I understand correctly, when
mu_is small it does not contribute much to the final mixture, that’s why the stick-size (ie the component weight) is near 0. It is a common thing when you have more component/feature than you need. -
When you are doing sampling I dont think conjugacy is important. You cannot have a closed form solution but you can still design efficient sampler for your problem.
-
It is referring to other free parameters in the model, as it basically adds a new mixture component/feature into the model.