NaN occured in optimization in a VonMises mixture model

ahmadsalim · May 29, 2018, 8:38am

I am getting the error “NaN occured in optimization” for a very simple Von Mises mixture model:

import pymc3 as pm
import math
import numpy as np

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps)
for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))
print(model.logp(model.test_point))
# Output:
# mu_1_circular__ -1.0737914249146185
# kappa_1_log__ -1.0
# w_stickbreaking__ -1.3862943611198906
# vm -1.0737914249146185
# -4.533877210949127
with model:
    approx = pm.fit(obj_optimizer=pm.adagrad_window(learning_rate=2e-4))
# FloatingPointError: NaN occurred in optimization.

I have tried debugging using the instructions here, but I am still getting the same error.

This model seems simpler than the ones in the issue above (it’s spiritually equivalent to a simple Von Mises model).
Are there any suggestions on how to possibly debug this error in the PyMC3 code base?

Thanks!

junpenglao · May 29, 2018, 8:44am

(I moved this to a new topic)
Actually I cannot reproduce the error - are you on master branch?
Also, I know this is probably not the final model you have in mind, but you are using the same component in the mixture.
And FYI, you can now do model.check_test_point() to check your model instead of using the for-loop.

ahmadsalim · May 29, 2018, 9:11am

Thanks for the response.

The error only happens intermittently it seems, if you decrease the learning rate to normal (1e-3) or maybe increase iterations it will happen more often.

I am on the latest stable version, but I will try upgrading to master.

Regarding, the model, I created an MWE (minimal working example) to make it easier to debug and discuss the error. This is why I am using the same component.

junpenglao · May 29, 2018, 9:16am

Right I see. You can try tracking the parameters to see which one might be causing the error: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Tracking-parameters

Since you are not assigning any observation to the mixture, the VI approximate the mixture with a Gaussian, which might not work so well.

ahmadsalim · May 29, 2018, 10:00am

I have updated to master now.

I have tried tracking the parameters, but I could not find any significant difference from the parameters for the following equivalent model:

with pm.Model() as model2:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises('vm_1', mu=mu_1, kappa=kappa_1)

The above model seems to work perfectly fine with ADVI (regardless of whether the learning rate is 1e-3 or 2e-4).

I find this a bit weird, since the log probability of vm in model and vm_1 in mode_2 should be equivalent.
I have added a picture of the plots of the parameters, below if that is in any help.

Mixture von Mises model (model):

Normal von Mises model (model2):

junpenglao · May 29, 2018, 11:00am

You should not think of the log probability or likelihood as one value - better to think of them as a function/mapping/tensor. As a function, vm and vm_1 takes different input, and the space induced by the input and parameterization could be very different.
To see this more clear, you could try sampling from the two model.

More generally, looking at the tracked value it seems the value is not converging towards a (local) minimum. Using ADVI might not be a good inference scheme here.

junpenglao · May 29, 2018, 11:18am

In this particular use case, I think the problem is the transformation in the mixture. So I try making the two model really identical:

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps)

In this way vm would really be identical to vm_1 in model1, however the ADVI still gives the same error.

The reason is that mixture class does not have a default tranformation. This means the mixture in this case has the support in [-pi, pi], but the approximation has the support in [-inf, inf]. Usually when we are using Mixture with observed it is fine, but in this case sometimes the approximation goes out of support and gives error.
The solution: assign a transformation:

import pymc3.distributions.transforms as tr

with pm.Model() as model:
    mu_1 = pm.VonMises('mu_1', mu=0, kappa=1)
    kappa_1 = pm.Gamma('kappa_1', 1, 1)
    vm_1 = pm.VonMises.dist(mu=mu_1, kappa=kappa_1)
    w = np.ones(2)*.5 # pm.Dirichlet('w', np.ones(2))
    vm_comps = [vm_1, vm_1]
    vm = pm.Mixture('vm', w, vm_comps, transform=tr.circular)

I believe now it should be much more robust. However, my previous comment still applies:

ahmadsalim · May 29, 2018, 11:38am

Great, thank you very much!

I apologize if I asked a lot of beginner questions lately, but I am grateful for the quick responses.

junpenglao · May 29, 2018, 11:44am

You are welcome! No question is too small - for example I never realize the potential problem of transformation in mixture up until now.

Topic		Replies	Views
How to use pymc3.fit() method Questions	16	3799	May 7, 2018
NaN occurred in optimization with ADVI Questions	5	4578	August 6, 2019
How to solve bad initial energy (mixture of von mises distribution) Questions	5	3263	May 27, 2019
NaN occurred in optimization (error in DL with variational minibatches) Questions	6	2015	March 21, 2018
Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN Questions	2	609	April 12, 2021

NaN occured in optimization in a VonMises mixture model

Related topics