Explanation of logp and customDist for Blackbox likelihood functions


I have implemented a blackbox likelihood function in my model, using the framework described in Using a “black box” likelihood function (numpy) — PyMC example gallery

In this code, there is a user-defined log likelihood function:

def my_loglike(theta, x, data, sigma):
    model = my_model(theta, x)
    return -(0.5 / sigma**2) * np.sum((data - model) ** 2)

The log likelihood is then calculated using an Op class and included in the model with pm.Potential:

# create our Op
logl = LogLike(my_loglike, data, x, sigma)

# use a Potential to "call" the Op and include it in the logp computation
    pm.Potential("likelihood", logl(theta))

Though this works for parameter estimation, I want to perform model comparision using WAIC or through calculating the Bayes factor for different models. I was told this can be done using pm.CustomDist and defining a logp function.

What is the logp function - is it the same as the my_loglike function? In the documentation of pm.CustomDist, an example logp function is shown as:

def logp(value: TensorVariable, mu: TensorVariable) -> TensorVariable:
    return -(value - mu)**2

Is this needed when I have already included the my_loglike function?

This might be a very basic question but I’m new to Bayesian modelling and have been trying to understand this for some time now. Thanks!

The example shouldn’t have data and x be passed to __init__ which make is unnecessarily restrictive and doesn’t really match the CustomDist API. I’ll update the example, but for now your logp function would look something like:

def logp(value, x, theta):
  # `value` and `x` are ignored because they are stored inside the LogLike Op at init time
  return logl(theta)

with pm.Model() as m:
    ...  # Priors and definition of `logl` `Op`
    CustomDist("loglike", x, theta, logp=logp, observed=data)

Importantly you have to make sure your logl function return the log-likelihood of each observation, if you wish to use it for model comparison. Any summation will be done by PyMC later down the road.

Thanks Ricardo, that makes sense.

To modify my code to get the log-likelihood of each observations, I’m guessing I need to change my_loglike function to not sum the likelihoods, and change the LogLike otype to pt.dvector, to get a vector of log likelihood values?


As promised: Update blackbox likelihood example by ricardoV94 · Pull Request #631 · pymc-devs/pymc-examples · GitHub

1 Like