NaN occurred in optimization with ADVI

Nadheesh · April 16, 2018, 6:43am

I tried to infer following logistic regression model using ADVI,

 def invlogit(self, x):
        return tt.exp(x)/(1+ tt.exp(x))

 with pm.Model() as self.logistic_model:
            alpha = pm.Normal('alpha', mu=0, sd=20)
            beta = pm.Normal('beta', mu=0, sd=20, shape=X.shape[1])

            mu = alpha + tt.dot(X, beta)
            p = pm.Deterministic('p', invlogit(mu))

            y = pm.Bernoulli('y', p=p,  observed=y)
            apprx = pm.fit(1000)

Then I got the following error,

Average Loss = 1.0766e+07:   0%|          | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/nadheesh/PycharmProjects/mcmc_vs_variational/mcmc/models.py", line 319, in <module>
    lr.fit(X,y)
  File "/home/nadheesh/PycharmProjects/mcmc_vs_variational/mcmc/models.py", line 193, in fit
    apprx = pm.fit(10000, obj_optimizer=pm.adam(), obj_n_mc=20)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 756, in fit
    return inference.fit(n, **kwargs)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 135, in fit
    state = self._iterate_with_loss(0, n, step_func, progress, callbacks)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 181, in _iterate_with_loss
    raise FloatingPointError('NaN occurred in optimization.')
FloatingPointError: NaN occurred in optimization.

I try to investigate this a little bit and found that this is cause by the NaN return by the step_function() when calculating the error. Moreover, If I change the Bernoulli distribution to a Normal distribution then model can be trained without any error. However, I can’t understand why this error is observed when using the Bernoulli likelihood.

I appreciate if someone can help me to resolve this issue.

junpenglao · April 16, 2018, 7:52am

There are a few other discussions on this topic, did you have a look? https://discourse.pymc.io/search?q=NaN%20occurred%20in%20optimization

Nadheesh · April 16, 2018, 10:00am

Thanks for pointing out @junpenglao. I checked if they are relevant and I could not find a solution to my question from those topics.

May be I’m too dumb understand the solution reading those posts, I appreciate if you can help me to understand why this error occurs?

The NaN returns when calling step_function during the approximation. Therefore, after checking if the e is NaN PyMC3 throws this error.

I tried using small learning rate, and check for this initialization as well.

Do you have any other suggestions @junpenglao ?

junpenglao · April 16, 2018, 11:22am

Usually, if you are seeing the the NaN problem in the first iteration, there is problem of the approximation set up that makes the score function (target to be optimized) invalid.
So, I would follow:

check if the original model is set up correctly, see eg here.
print the test value of the deterministic node to make sure that the value is valid, eg doing p.tag.test_value in your case.
make sure the starting value of the approximation is valid:

point = apprx.groups[0].bij.rmap(apprx.params[0].eval())
point #check to see there is no NaN or Inf

for var in logistic_model.free_RVs:
    print(var.name, var.logp(point))

If all the steps above pass fine, there is a problem with setting up the approximation score function, which is not that easy to diagnose. I would put everything in a jupyter notebook, and trace the error in %debug mode.

Nadheesh · April 18, 2018, 12:28pm

I found out what cause the error.

It was nothing to do with approximation set up as it seems. I had not normalized(scaled between 0 and 1) the datasets that I used for the training. For some reason, then step_function produce NaN when provided with values in different scale.

Then I tried to investigate a little bit more. It seems only when the abs(value) is large (<100 or >100) this error is seen. However, I also change the priors and likelihood and then identified that this error persists when only Bernoulli distribution is used.

Murtuza_Shergadwala · August 6, 2019, 12:49pm

Hello, I am getting the Nan occurred in optimization with ADVI and I am not able to figure out the issue.

Here is the code:

def std_cdf(x):
    """
    Calculates the standard normal cumulative distribution function.
    """
    return .5 + .5 * tt.erf(x / tt.sqrt(2.))

def std_pdf(x):
    return tt.exp(-x**2/2.0) / tt.sqrt(2*pi)

def Pwin_current(q, mean_oppq, scale_oppq):
    x=(q-mean_oppq)/scale_oppq
    return std_cdf(x)

def Pwin_next(q,mean_q_EI, scale_q_EI, mean_oppq, scale_oppq,s):
    
    s_1=tt.maximum(s,q)
    
    s_2=Pwin_current(s_1,mean_oppq, scale_oppq)
    
    tot=tt.sum(s_2,axis=1)
    
    
    return tot/10000

dsize=df.tries[i-1]+1
Suc=df.S[i-dsize:i].values
Y_min=df.Y_min[i-dsize:i].values
mean_q_next=df.q_next[i-dsize:i].values
scale_q_next=df.q_scale[i-dsize:i].values
mean_q_EI=_shared(mean_q_next)
scale_q_EI=_shared(scale_q_next)
mean_q_EI=tt.reshape(mean_q_EI,(tt.shape(mean_q_EI)[0],1))
scale_q_EI=tt.reshape(scale_q_EI,(tt.shape(scale_q_EI)[0],1))
q=_shared(-Y_min)
q=tt.reshape(q,(tt.shape(Y_min)[0],1))
St=_shared(Suc)
St=tt.reshape(St,(tt.shape(St)[0],1))
srng = RandomStreams(seed=234)        
s=srng.normal(size=(tt.shape(mean_q_EI)[0],int_samp),avg=-mean_q_EI,std=scale_q_EI,ndim=None)

with pm.Model() as model:
        # PRIORS
        oq = pm.Normal('oq',mu=100,sd=20)
        sq = pm.Exponential('sq', 1.)
      
        alpha = pm.Exponential('alp', 1.)
        beta=pm.Exponential('beta', 1.)

        A=Pwin_next(q,-mean_q_EI, scale_q_EI, oq, sq,s)
        A=tt.reshape(A,(mean_q_EI.eval().shape[0],1))

        
        U=A-Pwin_current(q, oq, sq)
        P=1/(1+tt.exp(alpha*(prize*U-cost-beta)))
        ll= pm.Bernoulli('ll',p=P,observed=St)
        
        #for RV in model.basic_RVs:
            #print(RV.name, RV.logp(model.test_point))
        
        trace = pm.fit(n=10000, method='advi', model=model, obj_optimizer=pm.adagrad_window(learning_rate=2e-4))

i do know that the log likelihood using bernoulli can spit out -infinity when my model tries to sample values…i am not sure how to have a work around that

this code is running perfectly fine with NUTs. Advi is faster but it is getting stuck after 70% or 80% of iterations…not sure why

Topic		Replies	Views
NaN occurred in optimization at first Iteration with ADVI Questions	3	746	February 18, 2020
How to use pymc3.fit() method Questions	16	3787	May 7, 2018
Pymc3 variational inference for multi-level logistic regression returning approximation equal to NaN Questions	2	604	April 12, 2021
NaN occurred in optimization with NUTS Questions	8	4692	May 29, 2018
ADVI giving NaN values to Elbo Questions	5	871	June 17, 2017

NaN occurred in optimization with ADVI

Related topics