OK I think I know why this happens: there are label switching in the mixture, for example, in the first trace pi=[.05, .95] but in the second trace pi=[.95, .05].
You can use an ordered transform to restrict the order of $alpha$: Mixture models and breaking class symmetry
After some more thought and playing with the model, I think your problem is not related to label switching (plus, you are running one chain only, right?). In fact when I reran the model I am getting the output as the model 1 in your case quite consistently. So what takes? My intuition is that there are multimode in the posterior, one around pi = [.05, .95] and one around [.35, .65]. A more informative prior should help with the model identification issue.
I put up a Gist notebook, which the two way of modelling the mixture regression converge to a similar result http://nbviewer.jupyter.org/gist/junpenglao/1907bf019906c125f63126ec4bf23880/Mixture_discourse.ipynb. It is important to run multiple chains and check the fitting.
It is also quite interesting in itself as it clearly shows that although the marginalizing and the non-marginalizing gives the same posterior, the effective sample size is much better with marginalizing.