Mixture of hierarchical model

model 1 is definitely in the right direction. However, you should still try to restrict the coefficient to make sure there is no mode switching. As a quick fix try adding this constraint into your model1:

    # break symmetry
    order_means_potential = pm.Potential('order_means_potential',
                                         tt.switch(mu_b[1]-mu_b[0] < 0, -np.inf, 0))