Yeah the softmax often makes the model unidentifiable, because the scaling does not plays a role anymore. In my problem, setting a Normal(0, 1) prior works well (I might even get rid of the HalfCauchy). One of the other solution is to restrict one of the column being zero in mu.
The model you are running now seems fine - did you try that with the default (doing just trace = pm.sample(1000, tune=1000, njobs=4) for example)