Hierarchical modeling with a black box likelihood function

Hi,
I have a model in which my data (x and y) are both measured random variables that are expected to have linear covariance i.e. they follow the form

The angle of a line fit to these data (theta) is the only parameter of interest in this case. All other parameters are nuisance parameters. The data in question tend to look something like this:

This is a relatively simple model, and making some assumptions, I have an analytical likelihood function for theta given these data. As it’s a one dimensional problem, I can just do a grid search and
that works fine (or use a black box likelihood function in PyMC3).

I now want to update this model as a hierarchical model. I have multiple experiments for which these data (x and y) are collected. Some of the experiments yield highly biased estimates with an uninformative posterior on theta, whereas some yield accurate estimates with an informative posterior on theta. I’m now struggling to parameterize this as none of the Pymc3 examples I have seen have a black box likelihood function for their multi level models.

As I understand it, I should have a parameter for each of my thetas, the likelihoods of which are determined by my black box function, and another parameter for my overall distribution of thetas (let’s call it omega) where P(theta | omega) looks like some sort of long tailed distribution. If any of you could help me parameterize this in a way that PyMC can understand, or if there are any examples where people have similarly done hierarchical modeling with black box likelihood functions, I’d appreciate knowing about it.

If the likelihood function is available - i.e., you can write down a logp function like:

def log_prob(*parameters):
    def logp(x, y):
        return f(x, y, *parameters)
    return logp

That would be the most straightforward.

I think you’re looking for DensityDist. As long as you have an implementation of your likelihood using theano you should be good to go.

def loglik_x_theano(theta, t, x):
  ...

def loglik_y_theano(theta, t, b, y):
  ...

with pm.Model() as model:
  theta_hp = pm.Bound(pm.Normal, lower=0.0, upper=3.1415)('theta_hp', 0., 1.)
  b_hp = pm.Normal(0, 10.)
  for k, (xvec, yvec, tvec) in enumerate(experimental_data):
    theta_k = pm.Bound(pm.Normal, lower=0.0, upper=3.1415)('theta_%d' % k, theta_hp, 1.)
    b_k = pm.Normal('b_%d' % k, b_hp, 1.)
    def lik_x_k(x_, t_):
      return loglik_x_theano(theta_k, t_, x_)
    def lik_y_k(y_, t_, b_):
      return loglik_y_theano(theta_k, t_, b_, y_)
    x_k = pm.DensityDist('x_%d' % k, loglik_x_theano, observed={'t_': tvec, 'x_': xvec})
    y_k = pm.DensityDist('y_%d' % k, loglik_y_theano, observed={'t_': tvec, 'b_': b, 'y_': yvec}

If your likelihood is truly black-box (you can call a non-Theano function; but cannot write it as a sequence of Theano operations) see Using pm.DensityDist and customized likelihood with a black-box function

2 Likes