Updating PyMC2 Code to new version of PyMC(Deterministic variables obtained by integrating Gaussian processes)

Hi everyone,

I’m a newcomer to pymc and I’m trying to update some PyMC2 code to the latest version of PyMC(v5). I have a specific code snippet where all parameters except ‘s~pm.Normal’ are similar instances of GPSubmodel from version 2. The variable ‘r’ represents the input for a Gaussian process, for example, S(r) ~ gp. Here’s a portion of the code:

from scipy.integrate import cumtrapz
import pymc as pm 
......

@pm.deterministic
def func(r=self.mesh, S=S, dht=dht, s=factor, b=b, w=w, B=B):

    f = w.f_eval * (b.f_eval - dht.f_eval)
    u = cumtrapz(f, r)
    return s * u / (w.f_eval * (S.f_eval - B.f_eval))

I’m not sure how to rewrite this code using the new version of PyMC. Any help would be greatly appreciated.

1 Like

There are two bits of API that have changed. The first is easy: the pm.Deterministic decorator is now just a function that takes a pytensor TensorVariable and stores the computation in your idata. So it doesn’t do anything special. You would just write:

with pm.Model() as mod:
    # Other model stuff

    # Write the computation for your func, using the symbolic variables. You can't call .eval() anymore! 
    f =  w * (b - dht)
    u = cumtrapz(f, r)
    int_res = pm.Deterministic('int_res', s * u / (w* (S- B)))

So that’s how Deterministic works. This code will not work, though, because you can’t seamlessly just drop arbitrary numpy/scipy functions into PyMC. Integrating them requires you to jump through a few hoops. This is because it needs to be represented as a pytensor Op – an atom of the computational graph used to construct your model. You have two options:

  • Option 1:
    Dig into the code for cumtrapz and see if you can write it out using Pytensor functions. It looks like it might work just fine. Just write a normal Python function that expects pytensor inputs, uses pytensor functions on the inside, and does the computations that cumulative_trapezoid does.

  • Option 2:
    Wrap cumulative_trapezoid in a Pytensor Op. This tutorial is marketed as “custom likelihood functions”, but it actually shows how to make an arbitrary custom Op. The advantage of this approach is that you can directly call cumtrapz. The downside is that you will not get gradients for the result, so you will either have to figure those out yourself and implement them, or give up on NUTS as your sampler (not recommended).

Option 1 is by far the better option, unless there’s something about cumulative_trapezoid that absolutely cannot be represented in Pytensor. In my experience, this is usually only complex control flow or highly specialized linear algebra routines – you shouldn’t fall into either case.

BTW, there’s an open issue for a Pytensor integrate.quad function, which links to some JAX code that shows how to do it. If you wanted to write the Op and contribute it to Pytensor, I bet a lot of others would find it useful :slight_smile:

2 Likes

Hi, jessegrabowski, thank you very much for your help! I’ve looked into your suggestion, and I also found a similar issue in this post regarding custom Theano Ops for numerical integration: Custom Theano Op to do Numerical Integration – but it seems not entirely identical to my problem. My integration domain corresponds to the definition domain of a Gaussian process, where ‘r’ is the numerical domain over which I need to evaluate ‘f’.

In contrast, the integration domain in that post appears unrelated to their random variable. I’m a bit uncertain about how to approach such a situation.

Additionaly I’m not very familiar with PyTensor, which makes it challenging for me. However, I’m still willing to invest time and follow the advice you’ve provided to try and understand it.

If you have any further suggestions or guidance, I would greatly appreciate it. Once again, thank you for your help!

You’re just doing a deterministic transformation of variables, so I think you’re overthinking it? As least as written in the PyMC2 code, the function will know both r and f when it’s time to do the integral, so you just need to re-write the function to be “pytensor aware”.

Actually the cumulative_trapezoid function is quite simple. Here is a stripped down version with all the input checks removed:

def tupleset(t, i, value):
    l = list(t)
    l[i] = value
    return tuple(l)

def cumulative_trapezoid_simple(y, x=None, dx=1.0, axis=-1):
    y = np.asarray(y)
    if x is None:
        d = dx
    else:
        x = np.asarray(x)
        if x.ndim == 1:
            d = np.diff(x)
        else:
            d = np.diff(x, axis=axis)

    nd = len(y.shape)
    slice1 = tupleset((slice(None),)*nd, axis, slice(1, None))
    slice2 = tupleset((slice(None),)*nd, axis, slice(None, -1))
    res = np.cumsum(d * (y[slice1] + y[slice2]) / 2.0, axis=axis)
    
    return res

It’s just some fancy slicing (which pytensor supports), differencing (supported), and cumsum (supported). Actually the only thing that won’t work is len(y.shape), but it’s bizarre they used this anyway (ndim exists, and they know it since it was used for x). So switch that, and switch the np to pt and get the following result:

from pytensor import tensor as pt

def cumulative_trapezoid_pt(y, x, dx=1.0, axis=-1):
    if x is None:
        d = dx
    else:
        if x.ndim == 1:
            d = pt.diff(x)
        else:
            d = pt.diff(x, axis=axis)
    nd = y.ndim
    
    slice1 = tupleset((slice(None),)*nd, axis, slice(1, None))
    slice2 = tupleset((slice(None),)*nd, axis, slice(None, -1))
    res = pt.cumsum(d * (y[slice1] + y[slice2]) / 2.0, axis=axis)
    
    return res

This function you can drop into your PyMC model. Here’s a small numerical check to make sure it works:

f = lambda x: 2 * x
x = np.linspace(0, 10, 100)
y = f(x)

# This code compiles the pytensor function, you **will not** need this in your pymc model
# it's included for completeness/transparency
x_pt = pt.dvector('x')
y_pt = pt.dvector('y')
res = cumulative_trapezoid_pt(y_pt, x_pt)
cumint = pytensor.function([y_pt, x_pt], res)

# Here are the actual checks
from scipy import integrate
np.allclose(integrate.cumulative_trapezoid(y, x), cumint(y, x)) # Out: True
np.allclose(cumint(y, x), x[1:] ** 2) # Out: True
1 Like