Using pm.DensityDist and customized likelihood with a black-box function

Hi,

I’m trying to use PyMC to find the optimal parameters that describe some observed data, but it’s not working. I have two vectored parameters: X(x1,x2,xn) and V(v1,v2,…vn). I also have a pre-defined class that wraps a black-box function to take in X and V, returns simulated data d1, and it also contains the observed data d0.

def likelihood(X,V):
    smod[0,2:] = X
    smod[1,2:] = V
    d1 = myBlackBox.calcD(smod)
    d0 = myBlackBox.getObserved()
    return (d1-d0)**2).sum()

with pm.Model as model:
   X = pm.Normal('X', mu=X_pr, sd=100, shape=len(X_pr))
   V = pm.Normal('V', mu=V_pr, sd=50, shape=len(V_pr))
   like = pm.DensityDist('like', likelihood, observed=dict(X=X, V=V))
   trace = pm.sample(10000)

This block of code is most-likely riddled with bugs, but I’m getting a ValueError with the first statement in the likelihood function. smod is global and ready contains some header information which is required by the blackbox function. So how do I map the values of X and V into smod so that it can be used to generate the simulated data? Also, are the statements correct within my pm.Model context manager? X and V are vectored parameters, with different mean values for their respective elements (i.e. mu=X_pr with X_pr also a vector). Can I use the pm.DensityDist in this manner?

Thanks!

You probably need to wrap your blackbox function with as_op. For example, see a similar discussion in Connecting PyMC3 to external code - help with understanding Theano custom Ops and more in https://discourse.pymc.io/search?q=as_op

@junpenglao thanks! But with as_op, will I still be able to use the gradient based samplers like HMCMC and NUTS?

I’d like to use a L2-norm likelihood function

P(d|m) \propto exp \frac{||d-f(m)||^2}{\sigma^2}

where d is the observed data, m are my parameters and f is the external function that returns simulated data. So can I just write an customized Theano function to return the difference between d and f(m) and inside pm.Model() use Normal()?

import theano.tensor as tt
from theano.compile.ops import as_op

@as_op(itypes=[tt.lscalar], otypes=[tt.lscalar])
def myFunc(X,V):
    smod[0,2:] = X
    smod[1,2:] = V
    d1 = myBlackBox.calcD(smod)
    d0 = myBlackBox.getObserved()
    return (d1-d0)**2.sum()

with pm.Model as model:
   X = pm.Normal('X', mu=X_pr, sd=100, shape=len(X_pr))
   V = pm.Normal('V', mu=V_pr, sd=50, shape=len(V_pr))
   like = myFunc(X,V)
   obs = pm.Normal('obs', mu=like, observed=d0)
   trace = pm.sample(10000)
  • No, gradient information is not available using @as_op
  • Yes, using a Normal likelihood with constant sigma (ie, not a latent variable) is the same as L2 regularization,

Ok, thanks! So in that case, should I write an extension of the theano.op class and place my gradient computation in the grad() method?

Yes. There are a few thread on that, see eg Custom theano Op to do numerical integration

Hi, I implemented the tt.Op extension and it works with the metropolis sampler. However, I’m a bit stuck on getting the gradient to work (so I can use NUTS). It returns an error saying the object returned the wrong number of gradient terms. For my case, the inputs are 3 vector variables, so total number of free parameter is 3*len(ABC). As a test I set them to just 3 scalars, so the input to the Op class has a length of 3 and gradient also has a length of 3, so I’m not sure why Theano is complaining.

### Log likelihood class with gradient ####
class LogLikelihood(theano.tensor.Op):
    #itypes = [tt.dvector, tt.dvector, tt.dvector]
    itypes = [tt.scalar, tt.dscalar, tt.dscalar]
    otypes = [tt.fscalar]

    def __init__(self, fm):
        self.fm = fm
        self.grad = LoglikeGrad(self.fm)
        super(LogLikelihood, self).__init__()

    def perform(self,node,inputs,outputs):

        A, B, C = inputs
        model = np.copy(self.fm.model)
        model = np.array([A,B,C]).T
        self.fm.setModel(model)
        likelihood = self.fm.getOF()
        outputs[0][0] = np.array(likelihood)

    def grad(self, inputs, g):
        A, B, C = inputs
        return [g[0]*self.grad(A,B,C)]

### Gradient class  ###
class LoglikeGrad(theano.tensor.Op):
    #itypes = [tt.dvector, tt.dvector, tt.dvector]
    itypes = [tt.scalar, tt.dscalar, tt.dscalar]
    otypes = [tt.fvector]
    
    def __init__(self, fm):
        self.fm = fm
        super(LoglikeGrad, self).__init__()

    def perform(self,node,inputs,outputs):
        A, B, C = inputs
        model = np.copy(self.fm.model)
        model = np.array([A,B,C]).T
        self.fm.setModel(model)
        self.fm.buildFrechet()
        grads = self.fm.einSum()
        outputs[0][0] = grads

### Instantiate logLike tt.Op class
fm = myOwnClass(data,ABC_init)
logP = LogLikelihood(fm)

## Set up probablistic model
with pm.Model():
    
    A = pm.Normal('A', mu=ABC_init[-1,1], sd=200.0)
    B = pm.Normal('B', mu=ABC_init[-1,2], sd=100.0)
    C = pm.Normal('C', mu=ABC_init[-1,3], sd=0.1)
    
    # Custom likelihood
    likelihood = pm.DensityDist('likelihood', lambda A,B,C: logP(A,B,C), observed={'A':A,'B':B,'C':C})
    
    # Sample from posterior
    trace = pm.sample(draws=5000)

Did you validate the gradient: http://docs.pymc.io/advanced_theano.html (see near the end)?
Also, you might find recent PR from @mattpitkin helpful: https://github.com/pymc-devs/pymc3/pull/3169

I just tried to import the unit test tools but got an error: “No module named nose_paramterized”. Could this be a version issue? I’m running python 2.7 and pymc3 3.0, theano 0.9.0.dev.

Could you please try to upgrade your pymc3 and theano?