Yeah I’m not a fan of this kind of mixture models – they are quite hard to fit and they don’t have the same flexibility as hierarchical models.
I don’t know your use-case and data, and it’s a big topic so I can’t expand much on it here, but the clusters can basically be any non-ordered entity (individual, county, state, year, hospital…).
I’m not sure what you call the “the dummy variable approach”, but I think it’s close to the hierarchical approach indeed – except that in the latter you use index variables (much more flexible), and you also have hyper priors that modelize the overall population (ie. across clusters), which pools information across clusters and results in shrinkage of within-cluster parameters towards the overall mean.
If this could be useful to you, I really recommend chapter 12 of McElreath’s Rethinking (1st edition). It’s a very thorough and pedagogical explanation – here is the port of this chapter to PyMC3.
I also updated the radon-level notebook on PyMC’s website, but it’s not online yet.