Gradient is nan whenever exponent is <1

juststarted · February 15, 2022, 5:23pm

Hi everyone,

I am getting some nan gradients and I am not seeing the reason for it. Here is a minimal working example:

with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5
    )

    array = np.array([0., 0., 1.])
    
    probs = (1-array[:-1])**beta - (1-array[1:])**beta
    
    # I get NaN gradients whenever I try to exponentiate
    # by a number below 1
    value = probs**0.9
    
    tt.printing.Print('value')(
        value.flatten()
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), beta)
    )

The output of this is:

value __str__ = [0. 1.]
grad __str__ = [nan nan]

On the other hand, changing the exponent from 0.9 to 1.1 prints out:

value __str__ = [0. 1.]
grad __str__ = [0. 0.]

Any help appreciated! Thanks.

Martin_Ingram · February 15, 2022, 5:35pm

I’ve not had the chance to look closely yet, but have you tried computing these gradients by hand? If the exponent is smaller than 1, perhaps the gradient involves a negative exponent, i.e. a division, and since some of your input values are zero, I wonder if there could be a division by zero problem somewhere. If that’s the case, it might also disappear for some other input values.

juststarted · February 16, 2022, 10:56am

Thanks! I think I solved the original problem, but another problem occurs where something stranger might be going on. Here is some more code:

def print_value_theano(value, wrt):
    tt.printing.Print('value')(
        value
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), wrt)
    )
    print('\n')


with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5,
        testval=2.
    )
    
    ap = np.array([1.])+beta 
    bp = np.array([1.])+beta
    
    # This is fine!
    print_value_theano(
        (ap*bp)**0.9,
        beta
    )
        
    zeros = tt.zeros(1)
    a = tt.concatenate((zeros, ap))
    b = tt.concatenate((zeros, bp))
    
    # This is fine!
    print_value_theano(
        (a*b)**0.9, 
        beta
    )
    
    # same result with: 
    # value = a[:,np.newaxis,np.newaxis] * b[:,np.newaxis,np.newaxis,np.newaxis]
    value = tt.shape_padright(a, 2) * tt.shape_padright(b, 4)
    
    # The gradient of this is all NAN!!
    print_value_theano(
        value**0.9, 
        beta
    )

This prints out:

value __str__ = [7.22467406]
grad __str__ = [4.33480443]


value __str__ = [0.         7.22467406]
grad __str__ = [0.         4.33480443]


value __str__ = [[[[[0.        ]]
   [[0.        ]]]]
 [[[[0.        ]]
   [[7.22467406]]]]]
grad __str__ = [nan nan nan nan]

Note that (a*b)**0.9 has a well-defined gradient, but the padding breaks things. Maybe this is still related to the gradient as such, but I don’t see how the padding would make a difference here. Thanks again for the help!

juststarted · February 16, 2022, 12:50pm

Okay I think you were right and this is essentially the issue:

def print_value_theano(value, wrt):
    tt.printing.Print('value')(
        value
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), wrt)
    )
    print('\n')


with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5,
        testval=0.5
    )
    
    value = tt.concatenate(([0.], [beta]))

    print_value_theano(
        value**0.9, 
        beta
    )
    
    value = np.array([0., 1])*beta
    # The gradient of this is all NAN!!
    print_value_theano(
        value**0.9, 
        beta
    )
    
    arr = np.array([0., 1])
    value = tt.switch(arr>0, arr*beta,0)
    print_value_theano(
        value**0.9, 
        beta
    )

Which prints out

value __str__ = [0.         0.53588673]
grad __str__ = [0.         0.96459612]


value __str__ = [0.         0.53588673]
grad __str__ = [nan nan]


value __str__ = [0.         0.53588673]
grad __str__ = [0.         0.96459612]

The problem now is that even with the third version with switch (which only keeps the non-nan gradients) PyMC3 seems to refuse to sample, and I can’t use the first version because the zeros are actually in a big array that I can’t construct on the fly.

Topic		Replies	Views
Dimension error when using pm.Potential v5 modeling	9	438	August 4, 2023
Pytensor pairwise distance matrix has nan gradient v5 pytensor	8	486	March 29, 2023
Theano.gradient(beta_0, alpha (beta samples are from a beta_dist which takes as input alpha)) returns DisconnectedInputError Questions theano	13	1509	March 18, 2018
PyMC Graph accepting gradients? v5	3	86	July 17, 2024
Deterministic Theano Op - Gradient Questions theano	4	860	May 26, 2019

Gradient is nan whenever exponent is <1

Related topics