Pointwise log-likelihood for loo/waic with custom "black-box" probability distribution


I’d like to estimate a model (and then do some model comparisons) using pymc. The model for generating the predicted probability distribution itself depends on an external module and requires a numpy array as input. So, while I have control over how the likelihood is calculated, I have no control over how the model prediction is obtained (numpy array in → model predictions as numpy array back). I followed the guide on how to use a “black box” likelihood to wrap an instance of my model (which internally calls the external code, handles the observed data, likelihood calculation etc.) in a custom pytensor op to call and return the log_likelihood.

class PtDdmLogLike(pt.Op):

  itypes = [pt.dvector] # expects a vector of parameter values when called
  otypes = [pt.dscalar] # returns a single scalar value (the log likelihood)

  def __init__(self, ddm_model):
      # set up my custom backend model instance
      self.ddm_model = deepcopy(ddm_model) 

  def perform(self, node, inputs, outputs):
      (theta,) = inputs # unpack inputs

      # modify the parameters of the model

      # Call the log-likelihood function (the observed data is an attribute of ddm_model)
      logl = self.ddm_model._log_like()

      outputs[0][0] = np.array(logl) # output the log likelihood

Then I use a potential to call the op…

data = pd.read_csv(...)
my_model = model(data) # create an instance of my model

with bm_model:
    priorA = ...
    priorB = ...
    theta = [priorA , priorB]
    # create our plugin Op instance
    logl = PtDdmLogLike(my_model) # plug in my model instance
    # Use a potential to "call" the Op
    pm.Potential("Likelihood", logl(theta))
    custdata = pm.sample(...)

This all works fine. However, I would like to do some model comparisons, and, as far as I can see, many of the techniques require the calculation of WAIC/LOO, for which I need access to the pointwise log-likelihood estimates of each individual data point. This is not a problem in principle, as I could easily modify the _log_like() function of my model instance to return a vector of log-likelihoods for each individual observation. However, I am struggling with how to connect it within the pymc framework.

From a search on pymc discourse, I found this question, which is highly related to my question. However, the links provided seem to be broken, and older versions of the corresponding article use code that is no longer available in newer versions of pymc/arviz.

I also found questions/answers that suggested using pm.Deterministic or pm.DensityDist, but they left open how to implement it exactly or were not suitable for my case.

Thus, is there a tutorial which demonstrates how to call a custom “black box” likelihood function (which requires a numpy array of parameters as input) and subsequently calculate loo/waic?

All help is much appreciated :slight_smile: