I have something close that’s giving my best result yet. It’s using a DensityDist like the example from Gaussian Mixture Model with ADVI in the PyMC3 docs. I’d like to try your idea of applying a softmax constraint, but I’m not sure where to do that. Is that an additional factor in the “logps” variable? I have a github notebook modified to use my data if you’d like to take a peek :).