Using stick breaking algorithm example but receiving gradient problem

Thank you so much, junpenglao. Your post not only helped with the main post but gave me invaluable tools for future problems I might have. Unfortunately I’m still getting infinite energies but perhaps that is just because, as you said, they’re hard to sample. I have some follow up questions if you don’t mind.

  1. How would you construct a more robust model for this hierarchal data? I was thinking perhaps a categorical probability?

  2. why did the original example not need normalization for the mixtures?

Thank you so much!