Creating Theano Op to work on vectors element-wise within Model()?

Hi everyone,

I am trying to use a custom Theano.Op subclass in a model and I am running into errors that I think are related to applying scalar Ops to vectors. I am trying to wrap my Op in theano.tensor.Elemwise, but I have been unsuccessful so far. Below are more details of my problem and a minimal working example of what I’m trying to do.

Any help, comments, or questions would be greatly appreciated. I will be happy to provide any clarification or additional information you might need to help.

In particular, how does pymc3 makes functions defined in terms of built-in Theano Ops work with the input/output vectors required to add observed variables to a likelihood function? How does what I’m trying to do below differ from a function defined in terms of built-in Theano Ops, and how can I correct this difference?

Thanks!

Jeff


I am trying to implement a deterministic function (Fermi-Dirac integral) as part of my model, which is defined through (numerical) integration. The derivative of the function is known, however, so I can implement the gradient and use NUTS, etc. in sampling. I have written a subclass of theano.Op that compiles and runs correctly outside of pymc3. Below is the code that defines my custom Op as well as a code using the Op in a pymc3 model.

# Import libraries required for minimal working example

import numpy as np
from sympy import mpmath

import theano
from theano import tensor as tt
import pymc3

# Custom Theano.Op subclass

class F(theano.Op):
    """
    Custom Theano Op using mpmath tahn-sinh quadrature integration.
    
    $ F(j, \eta) = \int_0^{\inf}dx x^j/(1+exp(x-\eta)) $

    Parameters
    ----------
    j : scalar
        Index parameter. No gradient defined for this parameter.
    eta : scalar
        Continuous argument of function. Derivative is defined by
        $ \frac{d}{d\eta}F(j, \eta) = j F(j-1, \eta) $

    Returns
    -------
    f : scalar
        Output of integral.

    """
    # These parameters define a unique Op
    __props__ = ('maxdegree', 'upper_bound', 'cutoff')
    
    # Using itypes/otypes and letting theano deal with type checking
    itypes = [tt.dscalar, tt.dscalar]
    otypes = [tt.dscalar]
    
    def __init__(self, maxdegree=20, upper_bound=60., cutoff=15.):
        """
        Initialize the Op with specific values of integration control parameters.
        
        Parameters
        ----------
        maxdegree : int, optional
            Maximum degree of quadrature rule for mpmath.quad to try 
            before quitting. Defaults to 20.
        upper_bound : float, optional
            Constant added to eta to define the upper bound of the integral. 
            Defaults to 60.
        cutoff : float, optional
            Cutoff value to determine whether the integral should be 
            split into two pieces or not. Defaults to 15.            
        
        """
        super(F, self).__init__()
        
        self.maxdegree = maxdegree
        self.upper_bound = upper_bound
        self.cutoff = cutoff
      
    def perform(self, node, inputs, output_storage):
        """
        Numerically calculate the function.
        
        """
        # Convert inputs into types mpmath can read
        j, eta = map(float, inputs)
        # Define integrand
        def integrand(x):
            return mpmath.power(x, j)/(mpmath.exp(x - eta) + 1.)
        # Define bounds of integral
        if eta < 15.:
            bounds = [0., eta + self.upper_bound]
        else:
            bounds = [0., eta, eta + self.upper_bound]
        # Perform integral
        integral = mpmath.quad(integrand, bounds, maxdegree=self.maxdegree)
        # Set output value
        output_storage[0][0] = np.array(float(integral))
        
    def grad(self, inputs, output_gradients):
        """
        Calculate the gradient.
        
        """
        j, eta = inputs
        g, = output_gradients
        
        return [
            theano.gradient.grad_undefined(F(), 0, inputs), 
            g*j*F()(j-1., eta),
        ]

# Minimal working example making use of custom Theano.Op

# 5-dimensional input, with 10 rows of data
input_data = np.random.rand(10, 5)
# 1 output value for each row of input data
output_data = np.random.rand(10)

input_data = theano.shared(input_data)
output_data = theano.shared(output_data)

example = pymc3.Model()

with example:
    # model parameter priors
    standard_dev = pymc3.Exponential('standard_dev', lam=1.)
    
    coefficient_vector = pymc3.Uniform('coefficient_vector', 
                                       upper=400., 
                                       lower=-40., 
                                       shape=5,
                                      )
    # model
    input_vector = tt.dot(input_data, coefficient_vector)
    
    f = F()
    model = (2.*f(tt.constant(1., dtype=tt.dscalar), input_vector)
        /f(tt.constant(0., dtype=tt.dscalar), input_vector) - input_vector)
    
    # likelihood
    model_output = pymc3.Normal('model_output',
                           mu=model,
                           sd=standard_dev,
                           observed=output_data,
                          )

When I run this code, I get the following error from the pymc3 model:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-97255855fde9> in <module>()
     25     f = F()
     26     model = (2.*f(tt.constant(1., dtype=tt.dscalar), input_vector)
---> 27         /f(tt.constant(0., dtype=tt.dscalar), input_vector) - input_vector)
     28 
     29     # likelihood

/home/Envs/pymc3/lib/python2.7/site-packages/theano/gof/op.pyc in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/home/Envs/pymc3/lib/python2.7/site-packages/theano/gof/op.pyc in make_node(self, *inputs)
    964             raise TypeError(
    965                 "We expected inputs of types '%s' but got types '%s' " %
--> 966                 (str(self.itypes), str([inp.type for inp in inputs])))
    967         return theano.Apply(self, inputs, [o() for o in self.otypes])
    968 

TypeError: We expected inputs of types '[TensorType(float64, scalar), TensorType(float64, scalar)]' 
but got types '[TensorType(float64, scalar), TensorType(float64, vector)]' 

This error is coming from passing a vector of input data into the Op rather than the scalar data that the function is looking for. I am not sure how to properly apply my Op to each entry in the vector input_vector.

If I replace

f = F() 

with

f = tt.Elemwise(F())

in the above code block, wrapping my Op in Elemwise, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-64089d35d7db> in <module>()
     25     f = tt.Elemwise(F())
     26     model = (2.*f(tt.constant(1., dtype=tt.dscalar), input_vector)
---> 27         /f(tt.constant(0., dtype=tt.dscalar), input_vector) - input_vector)
     28 
     29     # likelihood

/home/Envs/pymc3/lib/python2.7/site-packages/theano/gof/op.pyc in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/home/Envs/pymc3/lib/python2.7/site-packages/theano/tensor/elemwise.pyc in make_node(self, *inputs)
    576         inputs = list(map(as_tensor_variable, inputs))
    577         out_dtypes, out_broadcastables, inputs = self.get_output_info(
--> 578             DimShuffle, *inputs)
    579         outputs = [TensorType(dtype=dtype, broadcastable=broadcastable)()
    580                    for dtype, broadcastable in izip(out_dtypes,

/home/Envs/pymc3/lib/python2.7/site-packages/theano/tensor/elemwise.pyc in get_output_info(self, 
dim_shuffle, *inputs)
    518         shadow = self.scalar_op.make_node(
    519             *[get_scalar_type(dtype=i.type.dtype).make_variable()
--> 520               for i in inputs])
    521 
    522         target_length = max([input.type.ndim for input in inputs])

/home/Envs/pymc3/lib/python2.7/site-packages/theano/gof/op.pyc in make_node(self, *inputs)
    964             raise TypeError(
    965                 "We expected inputs of types '%s' but got types '%s' " %
--> 966                 (str(self.itypes), str([inp.type for inp in inputs])))
    967         return theano.Apply(self, inputs, [o() for o in self.otypes])
    968 

TypeError: We expected inputs of types '[TensorType(float64, scalar), TensorType(float64, scalar)]' 
but got types '[Scalar(float64), Scalar(float64)]'

For comparison, here is my custom Op working correcty on scalar inputs outside of pymc3:

# Demonstration of Theano Op working outside of pymc3
j = tt.dscalar('j')
eta = tt.dscalar('eta')
f = F()(j, eta)
compiled_f = theano.function([j, eta], f)

print compiled_f(0., 4.)
>>>>4.01814992792

I have also been unable to get my custom Op to work with Elemwise() outside of pymc3.

Hi @jeffwdoak, you can have a look at @aseyboldt’s WIP notebook on implementing a theano.Op with a gradient method: https://gist.github.com/aseyboldt/57ebb66dd6597d6ea7a9d59f24f264e0

You are on the right track. But I wouldn’t try to use tt.Elemwise for this, instead just generalize F to itypes = [tt.dscalar, tt.dvector]. Elemwise is very useful for C-Ops, as it handles a lot of details to get per-element overhead small. This isn’t really helpful here. You might also want to use numba for the integration, this can speed things up a great deal, and since the latest release some scipy integration routines can avoid the python call overhead for numba compiled functions (see here).
In some cases (I didn’t check if this is the case here), it can be easier to compute gradient and value at the same time in perform, and then return the gradient as a second output of the Op. The grad method can then use value, grad = self(*inputs) and return g * grad for the gradient of value, and – if you don’t need second derivatives – undefined for the derivative of grad. The notebook @junpenglao mentioned does this. (Update: no it doesn’t. Can’t find the right example right now…)

@junpenglao,

Thanks, I’ll take a look at the notebook.

Cheers,

Jeff

@aseyboldt,

Thanks for your suggestions. I’ll try implementing an Op generalized to vectors. I was looking into Elemwise rather than putting a for loop in my perform method because I had thought theano would be able to optimize the Elemwise call, but not the python for-loop. Is this not the case?

I’ll check out numba integration as well, however I am using mpmath’s numerical integration because it has the only implementation of tanh-sinh quadrature in python that I’ve found. The integral I’m calculating is somewhat pathological, but tanh-sinh quadrature performs very well.

Finally, the gradient is defined in terms of the function itself: d/deta F(j, eta) = j*F(j-1,eta), so I think that the symbolic definition I’m using now is probably sufficient.

Cheers,

Jeff

Using elemwise does help in two ways:

  • It allows the fusion of ops: in python syntax [f(g(x)) for x in vals] instead of [f(y) for y in [g(x) for x in vals]], which avoids iterating over an array twice. This can matter a lot for memory bound problems, where iterating over the array is more expensive than the actual computation. This clearly isn’t the case here.
  • If all elemwise ops have C code, theano will generate C- code that does the above, which can provide a large speedup in certain circumstances. The compiler can optimize this further, perhaps even use simd.

For a python Op, especially one that is expensive for each element, this doesn’t matter at all.

You can probably get much better speed by moving this particular loop to numba, that is, if the integration is fast enough for this to matter at all.