You will need to use some code involving tf.gradients
/ tf.GradientTape
(see more about these here). The blog post I linked to above uses tf.gradients
, I think your surrest bet would be to mimic that code. The idea here is that your ML model is just some function F: \Theta \to \mathbb{R}^d. Your grad
method needs to compute \frac{\partial F}{\partial \theta} which is what you can get with tf.gradients
.
Maybe this will make the grad Op
a little less daunting. Here’s a bare-bones example of a grad Op
that I’ve used before:
class GradOp(Op):
itypes = [tt.dscalar, tt.dvector] # First input is theta, second is for gradient
otypes = [tt.dscalar]
def __init__(self, nn_model):
self.nn_model = nn_model
def perform(self, node, inputs, outputs):
theta, g = inputs
result = np.float64(np.squeeze(self.nn_model.derivative(theta)))
outputs[0][0] = np.dot(result, g)
The difference between this and the grad Op
in that post is that in reality getting the derivative in TensorFlow isn’t as simple as using something like nn_model.derivative()
. The grad Op
from the post is:
class _TensorFlowGradOp(tt.Op):
"""A custom Theano Op defining the gradient of a TensorFlowOp
Args:
base_op (TensorFlowOp): The original Op
"""
def __init__(self, base_op):
self.base_op = base_op
# Build the TensorFlow operation to apply the reverse mode
# autodiff for this operation
# The placeholder is used to include the gradient of the
# output as a seed
self.dy = tf.placeholder(tf.float64, base_op.output_shape)
self.grad_target = tf.gradients(base_op.target,
base_op.parameters,
grad_ys=self.dy)
# This operation will take the original inputs and the gradient
# seed as input
types = [_to_tensor_type(shape) for shape in base_op.shapes]
self.itypes = tuple(types + [_to_tensor_type(base_op.output_shape)])
self.otypes = tuple(types)
def infer_shape(self, node, shapes):
return self.base_op.shapes
def perform(self, node, inputs, outputs):
feed_dict = dict(zip(self.base_op.parameters, inputs[:-1]),
**self.base_op._feed_dict)
feed_dict[self.dy] = inputs[-1]
result = self.base_op.session.run(self.grad_target, feed_dict=feed_dict)
for i, r in enumerate(result):
outputs[i][0] = np.array(r)
You can think of the extra code in the __init__
and perform
methods as just the extra overhead you need to do to actually fetch the gradients from TensorFlow.
p.s. If you are able to share your code through something like a Google Collab notebook I’d be happy to help you get this working.