Restricting the order of the location parameter (lam
in this case) usually prevents label switching and helps convergence - although your model still can be unidentifiable if you dont have enough data or the latent mixture location is too close. For more detail see Mike Betancourt’s case study (I have a PyMC3 port for this).
In this case, applying the similar parameter constrains:
import pymc3.distributions.transforms as tr
chain_tran = tr.Chain([tr.log, tr.ordered])
with pm.Model() as model:
lam = pm.Exponential('lam', lam=1,
shape=2,
transform=chain_tran,
testval=np.asarray([1., 1.5]))
pois = pm.Poisson.dist(mu=lam)
w = pm.Dirichlet('w', a=np.array([1, 1]))
like = pm.Mixture('like', w=w, comp_dists=pois, observed=data)
should makes the inference a bit easier.