Probably I can do the following to avoid evaluating the same Y for twice.
- Wrap the intermediate calculations Y into an operation without gradient.
- Create an operation A without a gradient, which calculates the log-likelihood log(X) using the operation Y.
- Create an operation B without a gradient, which calculates the gradient of the log-likelihood (1 / X) * ∂X/∂θ using the operation Y again.
- Create an operation C with gradient for the log-likelihood and its gradient. Operation C depends on operation A and B.
But, when I define a custom theano operation, I have the impression that the methods of my custom operation seem to take numpy array instead of theano tensors as arguments. While I imagine theano tensors are associated with a hash in the computation graph, the numpy arrays would not have this hash.
Can my strategy really avoid evaluating the computation in operation Y for twice?
How often does NUTS ask for the log-likelihood and for the gradient?