Hi! I’m new to PyMC3 and Theano, and hope you can help me with implementing a model that I’m struggling with. I think I’m misunderstanding something fundamental about Theano.
I am trying to use PyMC3 to sample a chisquare function to fit a model with some parameters to a data set. My chisquare calculation involves calls to an external program. Therefore, during each chisquare evaluation, I need to write the current parameters to file and call the external program. Following the PyMC3 docs, I have tried to implement this using a custom Theano Op: http://docs.pymc.io/advanced_theano.html
It works fine in the chisquare evaluation itself (in the perform() method of the Theano Op class). However, in the grad() method I run into trouble. When I try to write the contents of the input argument to file, it gives an error because the input is not a numpy array, but a Theano tensor. I see in the Theano docs (http://deeplearning.net/software/theano/extending/extending_theano.html#op-s-basic-methods) that this is how it’s meant to be:
For perform(), “inputs is a list of references to data which can be operated on using non-symbolic statements, (i.e., statements in Python, Numpy).”
While for grad(), “It takes two arguments inputs and output_gradients which are both lists of symbolic Theano Variables and those must be operated on using Theano’s symbolic language.”
My question is how to write inputs to file from inside the grad() method. What I’ve done so far is very hacky: I use a shared variable to keep the gradient, and update it every time perform() is called. Then grad() simply returns the current value of the shared variable. But I’m scared of this “solution”, because I suppose I can’t assume that perform() is always called with the same parameters right before grad()
Thanks for the reply! Yes, the gradient also depends on information from the external program.
Why wouldn’t Theano allow that, if the functions themselves are allowed to call external programs? I would think it’s usually the case that the gradient has similar dependencies as the function. But Theano is still very much a black box to me.
Thank you so much! I am new to this way of thinking (and the terminology), so please bear with me: “forward” means the function itself, and “backward” is its gradient? At first I got confused and wondered if you meant the “upper” and “lower” assignments in the example, but I gather those are just details specific to that problem.
I should be able to implement this without too much rewriting! Thanks again!
I did, actually. But then the project stopped for other reasons, so I haven’t looked at it in a while. Going back to it now, I see that I have a program that works (i.e. it runs, and passes all numerical gradient tests), but is rather ugly. Stripping it down to something I’m comfortable with posting publicly would take a bit of work, but if you like I could send it to you privately? Then maybe you can make a minimal working example to post here
Great. If you could send if to me privately that would be really nice. If you just google Matt Pitkin University of Glasgow you’ll find my email address easily.
In case it is of general interest, I’ve taken the example that @jorgenem sent me and created a blog post that tries to show a worked example of using a likelihood that is defined by an external code in PyMC3. If anyone spots any problems with my example please let me know.
@junpenglao Yes, I’d be happy to put this in a form that could go in the pymc3 docs. Should I just fork the pymc3 github repo, add the notebook to docs/source/notebooks and submit a PR?
I know this is an old topic, but I’m trying to implement the grad method of a Theano Op whose perform method returns a vector, not a scalar. More specifically, the Op operates on a dataset that can have one or more rows and it returns a single value for each row of the dataset (i.e., the output is an Mx1 matrix given an MxN input dataset). The Jacobian in this case will have M rows and P columns, where P is the number of variables in the model.
In my problem, I have a single variable and the dataset has 53 rows and 1 column. I’ve tried a bunch of different things, but I can’t get it to work. It keeps complaining the my input size is 1 (i_shape) and the gradient has a size of 53 (t_shape) in the theano/gradient.py file.
where self.model_grad calls the gradients function that @mattpitkin explained. Any suggestions? Note that I’m not implementing a log-likelihood Op that returns a single value given a dataset. I want my Op to return one output per row of a dataset.
Maybe I don’t fully understand how this method works. The code above returns a tensor with a single element, but I thought it should’ve returned a vector.
At first glance that seems like it’s just returning the first element of the vector that you actually want ( tt.dot(grad_outputs, J) )? Hopefully someone else can answer that.
My main reason for chiming in is that I recently wrote an Op that sounds similar to what you’re trying to do so thought I’d share. In my case, I have trained a neural network so solve an ODE at N time points and I need the gradient of each of these points w.r.t. the model parameter \theta (1-dimensional). So my perform method is returning a vector like yours is. Admittedly I couldn’t figure out why my gradient Op needed to output just a scalar, but my PyMC3 model seems to work fine and recover the true (known) parameters in the posterior so I think it’s correct? ¯\(ツ)/¯
Here’s my Op:
# Define the Theano Op for computing the gradient w.r.t. theta
class ODEGradop(Op):
itypes = [tt.dscalar, tt.dvector] # First input is theta, second is for gradient
otypes = [tt.dscalar] # I'm not sure why this is a scalar but it makes my code work :)
def __init__(self, pydens_model, t_obs):
self.pydens_model = pydens_model
# Setup the input for the neural net
self.t_comp = np.stack([t_obs, np.zeros_like(t_obs)]).T
self.theta_comp = np.stack([np.zeros_like(t_obs), np.ones_like(t_obs)]).T
def perform(self, node, inputs, outputs):
theta, g = inputs
nn_input = self.t_comp + theta*self.theta_comp
result = np.float64(np.squeeze(self.pydens_model.solve(nn_input, fetches="dxdalpha")))
outputs[0][0] = np.dot(result, g)
# Define the Theano Op for querying the solution of the ODE for any input theta
class ODEop(Op):
itypes = [tt.dscalar]
otypes = [tt.dvector]
def __init__(self, pydens_model, t_obs):
self.pydens_model = pydens_model
self.t_obs = t_obs
# Setup the input for the neural net
self.t_comp = np.stack([t_obs, np.zeros_like(t_obs)]).T
self.theta_comp = np.stack([np.zeros_like(t_obs), np.ones_like(t_obs)]).T
def perform(self, node, inputs, outputs):
theta, = inputs
nn_input = self.t_comp + theta*self.theta_comp
result = np.float64(np.squeeze(self.pydens_model.solve(nn_input)))
outputs[0][0] = result
def grad(self, inputs, output_grads):
x, = inputs
g, = output_grads
grad_op = ODEGradop(self.pydens_model, self.t_obs)
grad_op_apply = grad_op(x, g)
return [grad_op_apply]
Thanks! “Good” to know I’m not the only one confused by this! Yeah, it also seems to be working for me, but I don’t understand why. If the output of the main Op's perform is a vector, then the output of its grad should be a matrix, unless I’m missing something.