Gradient is nan whenever exponent is <1

I’ve not had the chance to look closely yet, but have you tried computing these gradients by hand? If the exponent is smaller than 1, perhaps the gradient involves a negative exponent, i.e. a division, and since some of your input values are zero, I wonder if there could be a division by zero problem somewhere. If that’s the case, it might also disappear for some other input values.