NaN occurred in optimization with ADVI

I tried to infer following logistic regression model using ADVI,

 def invlogit(self, x):
        return tt.exp(x)/(1+ tt.exp(x))

 with pm.Model() as self.logistic_model:
            alpha = pm.Normal('alpha', mu=0, sd=20)
            beta = pm.Normal('beta', mu=0, sd=20, shape=X.shape[1])

            mu = alpha + tt.dot(X, beta)
            p = pm.Deterministic('p', invlogit(mu))

            y = pm.Bernoulli('y', p=p,  observed=y)
            apprx = pm.fit(1000)

Then I got the following error,

Average Loss = 1.0766e+07:   0%|          | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/nadheesh/PycharmProjects/mcmc_vs_variational/mcmc/models.py", line 319, in <module>
    lr.fit(X,y)
  File "/home/nadheesh/PycharmProjects/mcmc_vs_variational/mcmc/models.py", line 193, in fit
    apprx = pm.fit(10000, obj_optimizer=pm.adam(), obj_n_mc=20)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 756, in fit
    return inference.fit(n, **kwargs)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 135, in fit
    state = self._iterate_with_loss(0, n, step_func, progress, callbacks)
  File "/home/nadheesh/anaconda3/envs/dev/lib/python3.5/site-packages/pymc3/variational/inference.py", line 181, in _iterate_with_loss
    raise FloatingPointError('NaN occurred in optimization.')
FloatingPointError: NaN occurred in optimization.

I try to investigate this a little bit and found that this is cause by the NaN return by the step_function() when calculating the error. Moreover, If I change the Bernoulli distribution to a Normal distribution then model can be trained without any error. However, I can’t understand why this error is observed when using the Bernoulli likelihood.

I appreciate if someone can help me to resolve this issue.

There are a few other discussions on this topic, did you have a look? https://discourse.pymc.io/search?q=NaN%20occurred%20in%20optimization

1 Like

Thanks for pointing out @junpenglao. I checked if they are relevant and I could not find a solution to my question from those topics.

May be I’m too dumb understand the solution reading those posts, I appreciate if you can help me to understand why this error occurs?

The NaN returns when calling step_function during the approximation. Therefore, after checking if the e is NaN PyMC3 throws this error.

I tried using small learning rate, and check for this initialization as well.

Do you have any other suggestions @junpenglao ?

Usually, if you are seeing the the NaN problem in the first iteration, there is problem of the approximation set up that makes the score function (target to be optimized) invalid.
So, I would follow:

  1. check if the original model is set up correctly, see eg here.
  2. print the test value of the deterministic node to make sure that the value is valid, eg doing p.tag.test_value in your case.
  3. make sure the starting value of the approximation is valid:
point = apprx.groups[0].bij.rmap(apprx.params[0].eval())
point #check to see there is no NaN or Inf
for var in logistic_model.free_RVs:
    print(var.name, var.logp(point))

If all the steps above pass fine, there is a problem with setting up the approximation score function, which is not that easy to diagnose. I would put everything in a jupyter notebook, and trace the error in %debug mode.

1 Like

I found out what cause the error.

It was nothing to do with approximation set up as it seems. I had not normalized(scaled between 0 and 1) the datasets that I used for the training. For some reason, then step_function produce NaN when provided with values in different scale.

Then I tried to investigate a little bit more. It seems only when the abs(value) is large (<100 or >100) this error is seen. However, I also change the priors and likelihood and then identified that this error persists when only Bernoulli distribution is used.

Hello, I am getting the Nan occurred in optimization with ADVI and I am not able to figure out the issue.

Here is the code:

def std_cdf(x):
    """
    Calculates the standard normal cumulative distribution function.
    """
    return .5 + .5 * tt.erf(x / tt.sqrt(2.))

def std_pdf(x):
    return tt.exp(-x**2/2.0) / tt.sqrt(2*pi)

def Pwin_current(q, mean_oppq, scale_oppq):
    x=(q-mean_oppq)/scale_oppq
    return std_cdf(x)

def Pwin_next(q,mean_q_EI, scale_q_EI, mean_oppq, scale_oppq,s):
    
    s_1=tt.maximum(s,q)
    
    s_2=Pwin_current(s_1,mean_oppq, scale_oppq)
    
    tot=tt.sum(s_2,axis=1)
    
    
    return tot/10000

dsize=df.tries[i-1]+1
Suc=df.S[i-dsize:i].values
Y_min=df.Y_min[i-dsize:i].values
mean_q_next=df.q_next[i-dsize:i].values
scale_q_next=df.q_scale[i-dsize:i].values
mean_q_EI=_shared(mean_q_next)
scale_q_EI=_shared(scale_q_next)
mean_q_EI=tt.reshape(mean_q_EI,(tt.shape(mean_q_EI)[0],1))
scale_q_EI=tt.reshape(scale_q_EI,(tt.shape(scale_q_EI)[0],1))
q=_shared(-Y_min)
q=tt.reshape(q,(tt.shape(Y_min)[0],1))
St=_shared(Suc)
St=tt.reshape(St,(tt.shape(St)[0],1))
srng = RandomStreams(seed=234)        
s=srng.normal(size=(tt.shape(mean_q_EI)[0],int_samp),avg=-mean_q_EI,std=scale_q_EI,ndim=None)

with pm.Model() as model:
        # PRIORS
        oq = pm.Normal('oq',mu=100,sd=20)
        sq = pm.Exponential('sq', 1.)
      
        alpha = pm.Exponential('alp', 1.)
        beta=pm.Exponential('beta', 1.)

        A=Pwin_next(q,-mean_q_EI, scale_q_EI, oq, sq,s)
        A=tt.reshape(A,(mean_q_EI.eval().shape[0],1))

        
        U=A-Pwin_current(q, oq, sq)
        P=1/(1+tt.exp(alpha*(prize*U-cost-beta)))
        ll= pm.Bernoulli('ll',p=P,observed=St)
        
        #for RV in model.basic_RVs:
            #print(RV.name, RV.logp(model.test_point))
        
        trace = pm.fit(n=10000, method='advi', model=model, obj_optimizer=pm.adagrad_window(learning_rate=2e-4))

i do know that the log likelihood using bernoulli can spit out -infinity when my model tries to sample values…i am not sure how to have a work around that

this code is running perfectly fine with NUTs. Advi is faster but it is getting stuck after 70% or 80% of iterations…not sure why