Thanks! I think I solved the original problem, but another problem occurs where something stranger might be going on. Here is some more code:
def print_value_theano(value, wrt):
tt.printing.Print('value')(
value
)
tt.printing.Print('grad')(
tt.jacobian(value.flatten(), wrt)
)
print('\n')
with pm.Model() as model_test:
beta = pm.Exponential(
'beta',
lam=0.5,
testval=2.
)
ap = np.array([1.])+beta
bp = np.array([1.])+beta
# This is fine!
print_value_theano(
(ap*bp)**0.9,
beta
)
zeros = tt.zeros(1)
a = tt.concatenate((zeros, ap))
b = tt.concatenate((zeros, bp))
# This is fine!
print_value_theano(
(a*b)**0.9,
beta
)
# same result with:
# value = a[:,np.newaxis,np.newaxis] * b[:,np.newaxis,np.newaxis,np.newaxis]
value = tt.shape_padright(a, 2) * tt.shape_padright(b, 4)
# The gradient of this is all NAN!!
print_value_theano(
value**0.9,
beta
)
This prints out:
value __str__ = [7.22467406]
grad __str__ = [4.33480443]
value __str__ = [0. 7.22467406]
grad __str__ = [0. 4.33480443]
value __str__ = [[[[[0. ]]
[[0. ]]]]
[[[[0. ]]
[[7.22467406]]]]]
grad __str__ = [nan nan nan nan]
Note that (a*b)**0.9 has a well-defined gradient, but the padding breaks things. Maybe this is still related to the gradient as such, but I don’t see how the padding would make a difference here. Thanks again for the help!