Problems with defining a custom likelihood that calls a Python wrapper function

I’m trying to do some Bayesian inference using PyMC3 to reproduce the results of a paper. My question is very similar to the one posed in [1].
Some context before I come to the issue I’m having. I’m trying to solve a toy problem, where I have a small reaction network with 24 reactions, and I am trying to determine the reaction rates. I have the Python wrapper function (“grainwrap”) of our group’s chemical code that takes in reaction rates (as a vector) and outputs the abundances of all the products. We know the final abundances of four of the products.

As the prior, I have set the reaction rate vector, k, to be log-uniform. Typically, the likelihood function that would be used is the L2-norm likelihood function, where the “data” are the observed abundances and the simulated abundances comes from grainwrap. However, as we only have abundances (with uncertainties) for four of the products, I had to write a small function (“index_selector”) that extracts just the abundances of those four products to evaluate the likelihood. As I knew which product abundances I wanted, I included their indices as default values for one of the arguments already.

However, as I am having to call an external function, grainwrap, to evaluate my likelihood, I’ve had to wrap “as_op”, which precludes me from using gradient-based samplers. That’s fine for now.

My code is as follows:

import numpy as np
import pymc3 as pm
import grainwrap
import theano
import theano.tensor as tt
from theano.tensor import _shared
from theano.compile.ops import as_op
import pickle
import sys

lower_exp = -30
upper_exp = -5
reaction_rate_vector_length = 24
number_of_samples = 1e4

data_abundances = np.asarray([3.4e-3,5.2e-6,3.2e-5,1.0e-5])
data_errors = np.asarray([1.8e-3,2.6e-6, 2.0e-5,4.0e-6])

def index_selector(vector, indices = [0,1,2,8]):
    output_vec = []
    for i in indices:
        output_vec.append(vector[i])
    return np.asarray(output_vec)
    
@as_op(itypes=[tt.dvector], otypes=[tt.dscalar])
def log_likelihood_func_(k):
    return -0.5*tt.sum(((data_abundances - index_selector(grainwrap.getfinalabunds(k)[0]))/data_errors)**2)


with pm.Model() as model:
    log_k = pm.Uniform("log_k", -30, -5, shape = reaction_rate_vector_length)
    like = log_likelihood_func_(10**log_k)
    obs = pm.Normal("obs", mu = like, observed = data_abundances , sigma = data_errors)
    trace = pm.sample(int(number_of_samples) ,tune = 1000, cores = 1, chains = 2,  progressbar = True)

When I run this, I get the error:

“The error when converting the test value to that variable type:
Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?”

This is due to the line “like = log_likelihood_func_(10**log_k)”. I wasn’t able to overcome this issue, so I decided to make use of DensityDist, in the hopes that that would solve my problem.

I then rewrote the code using [2] as an inspiration. Initially, I put my likelihood function inside my “with” statement, but I then moved it out, as functions with the “as_op” wrapper don’t accept keyword arguments. This is then what the code looks like:

import numpy as np
import pymc3 as pm
import grainwrap
import theano
import theano.tensor as tt
from theano.compile.ops import as_op
import pickle
import sys

lower_exp = -30
upper_exp = -5
reaction_rate_vector_length = 24
number_of_samples = 1e4

data_abundances = np.asarray([5.2e-6,1.2e-5,1.32e-5,4.0e-5])
data_errors = np.asarray([2.35e-6,0.8e-5,0.72e-5,1.25e-5])

def index_selector(vector, indices = [0,1,2,8]):
output_vec = []
for i in indices:
    output_vec.append(vector[i])
return np.asarray(output_vec)

@as_op(itypes=[tt.dvector, tt.dvector, tt.dvector], otypes=[tt.dscalar])
def log_likelihood_func_(k, data_abundances, data_errors):
return -0.5*tt.sum(((data_abundances - index_selector(grainwrap.getfinalabunds(k)[0]))/data_errors)**2)

with pm.Model() as model:
log_k = pm.Uniform("log_k", -30, -5, shape = reaction_rate_vector_length)
k = pm.Deterministic("k", 10**log_k)

def logp(data_abundances, data_errors):
    return log_likelihood_func_(k, data_abundances, data_errors)
                                
like = pm.DensityDist("like", log_likelihood_func_, observed = {"k":k,"data_abundances": data_abundances, "data_errors":data_errors })
trace = pm.sample(int(number_of_samples) ,tune = 1000, cores = 1, chains = 2,  progressbar = True)

Notice I’ve added “data_abundances” and “data_errors” as arguments to my likelihood function and then inserted another function “logp” inside the with statement to call that likelihood function. However, I then proceed to get the error “TypeError: make_node() got an unexpected keyword argument ‘k’” that points to the line starting with “like=”. I was under the impression, based on what was written in [2] that I should include parameters and data in my “observed” dictionary. I’m quite stuck at this point and am not sure how to proceed further. Can someone help please?