I defined a theano operation for the log-likelihood and its gradient, following using a blackbox likelihood function in the official documentation.
Both the log-likelihood log(X) and the gradient (1 / X) * ∂X/∂θ depend on the same intermediates Y, which take a considerable time to calculate.
Can I cache the value of Y to avoid calculating it for twice? I plan to use the NUTS sampler.
I would appreciate any pointer on this matter.
Probably I can do the following to avoid evaluating the same Y for twice.
- Wrap the intermediate calculations Y into an operation without gradient.
- Create an operation A without a gradient, which calculates the log-likelihood log(X) using the operation Y.
- Create an operation B without a gradient, which calculates the gradient of the log-likelihood (1 / X) * ∂X/∂θ using the operation Y again.
- Create an operation C with gradient for the log-likelihood and its gradient. Operation C depends on operation A and B.
But, when I define a custom theano operation, I have the impression that the methods of my custom operation seem to take numpy array instead of theano tensors as arguments. While I imagine theano tensors are associated with a hash in the computation graph, the numpy arrays would not have this hash.
Can my strategy really avoid evaluating the computation in operation Y for twice?
How often does NUTS ask for the log-likelihood and for the gradient?