Using stick breaking algorithm example but receiving gradient problem

Depends on your model. From the data generation, it seems that it would be best to model it as two linear regression with a latent break point. Currently, you are modelling the mixture weight as a linear function of X also, which is quite difficult and not at all realistic to your data generation process.

When you are using lots of component (large K), the stick-breaking is almost summed to 1.