Indeed.
toydata = np.ones(100)
marg2toy_model = pm.Model()
with marg2toy_model:
w=pm.Dirichlet('w', a=np.ones(5),shape=5)
Ndists = [ pm.MvNormal.dist(mu=t,cov=np.diag(np.ones_like(toydata)),shape=toydata.shape[0]) for t in range(5) ]
Y_obs = pm.Mixture("Y_obs",w=w,comp_dists=Ndists,observed=toydata)
marg2toy_trace = pm.sample()
az.plot_trace(marg2toy_trace,legend=True)

Now it’s clear. There were indeed two different models involved, with either one t in common for all observations or one t_i for each observation. So in one case the probability is \sum_t \left[\prod_i P(y_i|t)\right]P(t) and in the other case \prod_i \sum_{t_i} P(y_i|t_i) P(t_i), which is obviously not the same regardless of how the summation is implemented. If there is one t_i per observation then not much can be inferred on each of those t_i, except through the pooling, so w will have to contain most of the information. If t is common then w is less important because \prod_i P(y_i|t) already makes a strong choice.