# Using pm.DensityDist and customized likelihood with a black-box function

Hi,

I’m trying to use PyMC to find the optimal parameters that describe some observed data, but it’s not working. I have two vectored parameters: X(x1,x2,xn) and V(v1,v2,…vn). I also have a pre-defined class that wraps a black-box function to take in X and V, returns simulated data d1, and it also contains the observed data d0.

def likelihood(X,V):
smod[0,2:] = X
smod[1,2:] = V
d1 = myBlackBox.calcD(smod)
d0 = myBlackBox.getObserved()
return (d1-d0)**2).sum()

with pm.Model as model:
X = pm.Normal('X', mu=X_pr, sd=100, shape=len(X_pr))
V = pm.Normal('V', mu=V_pr, sd=50, shape=len(V_pr))
like = pm.DensityDist('like', likelihood, observed=dict(X=X, V=V))
trace = pm.sample(10000)


This block of code is most-likely riddled with bugs, but I’m getting a ValueError with the first statement in the likelihood function. smod is global and ready contains some header information which is required by the blackbox function. So how do I map the values of X and V into smod so that it can be used to generate the simulated data? Also, are the statements correct within my pm.Model context manager? X and V are vectored parameters, with different mean values for their respective elements (i.e. mu=X_pr with X_pr also a vector). Can I use the pm.DensityDist in this manner?

Thanks!

You probably need to wrap your blackbox function with as_op. For example, see a similar discussion in Connecting PyMC3 to external code - help with understanding Theano custom Ops and more in https://discourse.pymc.io/search?q=as_op

@junpenglao thanks! But with as_op, will I still be able to use the gradient based samplers like HMCMC and NUTS?

I’d like to use a L2-norm likelihood function

P(d|m) \propto exp \frac{||d-f(m)||^2}{\sigma^2}

where d is the observed data, m are my parameters and f is the external function that returns simulated data. So can I just write an customized Theano function to return the difference between d and f(m) and inside pm.Model() use Normal()?

import theano.tensor as tt
from theano.compile.ops import as_op

@as_op(itypes=[tt.lscalar], otypes=[tt.lscalar])
def myFunc(X,V):
smod[0,2:] = X
smod[1,2:] = V
d1 = myBlackBox.calcD(smod)
d0 = myBlackBox.getObserved()
return (d1-d0)**2.sum()

with pm.Model as model:
X = pm.Normal('X', mu=X_pr, sd=100, shape=len(X_pr))
V = pm.Normal('V', mu=V_pr, sd=50, shape=len(V_pr))
like = myFunc(X,V)
obs = pm.Normal('obs', mu=like, observed=d0)
trace = pm.sample(10000)

• No, gradient information is not available using @as_op
• Yes, using a Normal likelihood with constant sigma (ie, not a latent variable) is the same as L2 regularization,

Ok, thanks! So in that case, should I write an extension of the theano.op class and place my gradient computation in the grad() method?

Yes. There are a few thread on that, see eg Custom theano Op to do numerical integration

Hi, I implemented the tt.Op extension and it works with the metropolis sampler. However, I’m a bit stuck on getting the gradient to work (so I can use NUTS). It returns an error saying the object returned the wrong number of gradient terms. For my case, the inputs are 3 vector variables, so total number of free parameter is 3*len(ABC). As a test I set them to just 3 scalars, so the input to the Op class has a length of 3 and gradient also has a length of 3, so I’m not sure why Theano is complaining.

### Log likelihood class with gradient ####
class LogLikelihood(theano.tensor.Op):
#itypes = [tt.dvector, tt.dvector, tt.dvector]
itypes = [tt.scalar, tt.dscalar, tt.dscalar]
otypes = [tt.fscalar]

def __init__(self, fm):
self.fm = fm
super(LogLikelihood, self).__init__()

def perform(self,node,inputs,outputs):

A, B, C = inputs
model = np.copy(self.fm.model)
model = np.array([A,B,C]).T
self.fm.setModel(model)
likelihood = self.fm.getOF()
outputs[0][0] = np.array(likelihood)

A, B, C = inputs

#itypes = [tt.dvector, tt.dvector, tt.dvector]
itypes = [tt.scalar, tt.dscalar, tt.dscalar]
otypes = [tt.fvector]

def __init__(self, fm):
self.fm = fm

def perform(self,node,inputs,outputs):
A, B, C = inputs
model = np.copy(self.fm.model)
model = np.array([A,B,C]).T
self.fm.setModel(model)
self.fm.buildFrechet()

### Instantiate logLike tt.Op class
fm = myOwnClass(data,ABC_init)
logP = LogLikelihood(fm)

## Set up probablistic model
with pm.Model():

A = pm.Normal('A', mu=ABC_init[-1,1], sd=200.0)
B = pm.Normal('B', mu=ABC_init[-1,2], sd=100.0)
C = pm.Normal('C', mu=ABC_init[-1,3], sd=0.1)

# Custom likelihood
likelihood = pm.DensityDist('likelihood', lambda A,B,C: logP(A,B,C), observed={'A':A,'B':B,'C':C})

# Sample from posterior
trace = pm.sample(draws=5000)



Also, you might find recent PR from @mattpitkin helpful: https://github.com/pymc-devs/pymc3/pull/3169

I just tried to import the unit test tools but got an error: “No module named nose_paramterized”. Could this be a version issue? I’m running python 2.7 and pymc3 3.0, theano 0.9.0.dev.