That’s true if you use identity of cluster in the summary statistics. But most of what one cares about from clustering is not about summary statistics. You can calculate things like the data log density (ELPD) just fine for leave-one-out cross-validation, for example. The probability that two elements are in the same cluster can also be calculated with posterior predictive inference, as can the density of a new point. You can do posterior predictive checks no problem. None of this runs into label switching because it marginalizes over the label identity, so it should all lead to reasonable ESS and R-hat values.