In this particular use case, I think the problem is the transformation in the mixture. So I try making the two model really identical:
with pm.Model() as model:
mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
kappa_1 = pm.Gamma('kappa_1', 1, 1)
vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
vm_comps = [vm_1, vm_1]
vm = pm.Mixture('vm', w, vm_comps)
In this way vm
would really be identical to vm_1
in model1, however the ADVI still gives the same error.
The reason is that mixture class does not have a default tranformation. This means the mixture in this case has the support in [-pi, pi], but the approximation has the support in [-inf, inf]. Usually when we are using Mixture with observed it is fine, but in this case sometimes the approximation goes out of support and gives error.
The solution: assign a transformation:
import pymc3.distributions.transforms as tr
with pm.Model() as model:
mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
kappa_1 = pm.Gamma('kappa_1', 1, 1)
vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
vm_comps = [vm_1, vm_1]
vm = pm.Mixture('vm', w, vm_comps, transform=tr.circular)
I believe now it should be much more robust. However, my previous comment still applies: