Gradient is nan whenever exponent is <1

Hi everyone,

I am getting some nan gradients and I am not seeing the reason for it. Here is a minimal working example:

with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5
    )

    array = np.array([0., 0., 1.])
    
    probs = (1-array[:-1])**beta - (1-array[1:])**beta
    
    # I get NaN gradients whenever I try to exponentiate
    # by a number below 1
    value = probs**0.9
    
    tt.printing.Print('value')(
        value.flatten()
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), beta)
    )

The output of this is:

value __str__ = [0. 1.]
grad __str__ = [nan nan]

On the other hand, changing the exponent from 0.9 to 1.1 prints out:

value __str__ = [0. 1.]
grad __str__ = [0. 0.]

Any help appreciated! Thanks.

I’ve not had the chance to look closely yet, but have you tried computing these gradients by hand? If the exponent is smaller than 1, perhaps the gradient involves a negative exponent, i.e. a division, and since some of your input values are zero, I wonder if there could be a division by zero problem somewhere. If that’s the case, it might also disappear for some other input values.

Thanks! I think I solved the original problem, but another problem occurs where something stranger might be going on. Here is some more code:

def print_value_theano(value, wrt):
    tt.printing.Print('value')(
        value
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), wrt)
    )
    print('\n')


with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5,
        testval=2.
    )
    
    ap = np.array([1.])+beta 
    bp = np.array([1.])+beta
    
    # This is fine!
    print_value_theano(
        (ap*bp)**0.9,
        beta
    )
        
    zeros = tt.zeros(1)
    a = tt.concatenate((zeros, ap))
    b = tt.concatenate((zeros, bp))
    
    # This is fine!
    print_value_theano(
        (a*b)**0.9, 
        beta
    )
    
    # same result with: 
    # value = a[:,np.newaxis,np.newaxis] * b[:,np.newaxis,np.newaxis,np.newaxis]
    value = tt.shape_padright(a, 2) * tt.shape_padright(b, 4)
    
    # The gradient of this is all NAN!!
    print_value_theano(
        value**0.9, 
        beta
    )

This prints out:

value __str__ = [7.22467406]
grad __str__ = [4.33480443]


value __str__ = [0.         7.22467406]
grad __str__ = [0.         4.33480443]


value __str__ = [[[[[0.        ]]
   [[0.        ]]]]
 [[[[0.        ]]
   [[7.22467406]]]]]
grad __str__ = [nan nan nan nan]

Note that (a*b)**0.9 has a well-defined gradient, but the padding breaks things. Maybe this is still related to the gradient as such, but I don’t see how the padding would make a difference here. Thanks again for the help!

Okay I think you were right and this is essentially the issue:

def print_value_theano(value, wrt):
    tt.printing.Print('value')(
        value
    )
    tt.printing.Print('grad')(
        tt.jacobian(value.flatten(), wrt)
    )
    print('\n')


with pm.Model() as model_test:
    
    beta = pm.Exponential(
        'beta',
        lam=0.5,
        testval=0.5
    )
    
    value = tt.concatenate(([0.], [beta]))

    print_value_theano(
        value**0.9, 
        beta
    )
    
    value = np.array([0., 1])*beta
    # The gradient of this is all NAN!!
    print_value_theano(
        value**0.9, 
        beta
    )
    
    arr = np.array([0., 1])
    value = tt.switch(arr>0, arr*beta,0)
    print_value_theano(
        value**0.9, 
        beta
    )

Which prints out

value __str__ = [0.         0.53588673]
grad __str__ = [0.         0.96459612]


value __str__ = [0.         0.53588673]
grad __str__ = [nan nan]


value __str__ = [0.         0.53588673]
grad __str__ = [0.         0.96459612]

The problem now is that even with the third version with switch (which only keeps the non-nan gradients) PyMC3 seems to refuse to sample, and I can’t use the first version because the zeros are actually in a big array that I can’t construct on the fly.