NaN occured in optimization in a VonMises mixture model

I am getting the error “NaN occured in optimization” for a very simple Von Mises mixture model:

import pymc3 as pm
import math
import numpy as np

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps)
for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))
print(model.logp(model.test_point))
# Output:
# mu_1_circular__ -1.0737914249146185
# kappa_1_log__ -1.0
# w_stickbreaking__ -1.3862943611198906
# vm -1.0737914249146185
# -4.533877210949127
with model:
    approx = pm.fit(obj_optimizer=pm.adagrad_window(learning_rate=2e-4))
# FloatingPointError: NaN occurred in optimization.

I have tried debugging using the instructions here, but I am still getting the same error.

This model seems simpler than the ones in the issue above (it’s spiritually equivalent to a simple Von Mises model).
Are there any suggestions on how to possibly debug this error in the PyMC3 code base?

Thanks!

(I moved this to a new topic)
Actually I cannot reproduce the error - are you on master branch?
Also, I know this is probably not the final model you have in mind, but you are using the same component in the mixture.
And FYI, you can now do model.check_test_point() to check your model instead of using the for-loop.

Thanks for the response.

The error only happens intermittently it seems, if you decrease the learning rate to normal (1e-3) or maybe increase iterations it will happen more often.

I am on the latest stable version, but I will try upgrading to master.

Regarding, the model, I created an MWE (minimal working example) to make it easier to debug and discuss the error. This is why I am using the same component.

Right I see. You can try tracking the parameters to see which one might be causing the error: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Tracking-parameters

Since you are not assigning any observation to the mixture, the VI approximate the mixture with a Gaussian, which might not work so well.

I have updated to master now.

I have tried tracking the parameters, but I could not find any significant difference from the parameters for the following equivalent model:

with pm.Model() as model2:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises('vm_1', mu=mu_1, kappa=kappa_1)

The above model seems to work perfectly fine with ADVI (regardless of whether the learning rate is 1e-3 or 2e-4).

I find this a bit weird, since the log probability of vm in model and vm_1 in mode_2 should be equivalent.
I have added a picture of the plots of the parameters, below if that is in any help.

Mixture von Mises model (model):

Normal von Mises model (model2):

You should not think of the log probability or likelihood as one value - better to think of them as a function/mapping/tensor. As a function, vm and vm_1 takes different input, and the space induced by the input and parameterization could be very different.
To see this more clear, you could try sampling from the two model.

More generally, looking at the tracked value it seems the value is not converging towards a (local) minimum. Using ADVI might not be a good inference scheme here.

1 Like

In this particular use case, I think the problem is the transformation in the mixture. So I try making the two model really identical:

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps)

In this way vm would really be identical to vm_1 in model1, however the ADVI still gives the same error.

The reason is that mixture class does not have a default tranformation. This means the mixture in this case has the support in [-pi, pi], but the approximation has the support in [-inf, inf]. Usually when we are using Mixture with observed it is fine, but in this case sometimes the approximation goes out of support and gives error.
The solution: assign a transformation:

import pymc3.distributions.transforms as tr

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps, transform=tr.circular)

I believe now it should be much more robust. However, my previous comment still applies:

1 Like

Great, thank you very much! :smile:

I apologize if I asked a lot of beginner questions lately, but I am grateful for the quick responses.

You are welcome! No question is too small - for example I never realize the potential problem of transformation in mixture up until now.