Hierarchical Model with Black Box Loglikelihood

I have a question about implementing a Hierarchical model in PyMC using a Black Box loglikelihood function. Specifically, I want to make sure that PyMC is receiving correct inputs from the likelihood function. I have a model with 4 parameters, with one partially pooled across three data sets. I have three data sets, so I need to call the likelihood function for each of them using Potential. This gives me the following code:

# create Op which calls likelihood function and inputs data (rt)
logl0 = LogLike(loglikelihood, rt[0])
logl1 = LogLike(loglikelihood, rt[1])
logl2 = LogLike(loglikelihood, rt[2])

# define dictionary of subjects
coords = {'subjects' : [0,1,2]}
# use PyMC to sample from log-likelihood
with pm.Model(coords = coords) as model:
    # define priors
    tnd = pm.Uniform("tnd", lower = 0.0, upper = 1.0)
    v = pm.Uniform("v", lower = 0.0, upper = 5.0)
    sigma = 1.0
    a_mu = pm.Normal('a_mu', mu = 2.0, sigma = 0.75)
    a_sigma = pm.Normal('a_sigma', mu = 1.0, sigma = 0.25)
    a = pm.Normal("a", mu = a_mu, sigma = a_sigma, dims = 'subjects')
    # turn priors into tensor variables
    phi_pymc00 = at.as_tensor_variable([tnd, v, sigma, a[0]])
    phi_pymc01 = at.as_tensor_variable([tnd, v, sigma, a[0]])
    phi_pymc02 = at.as_tensor_variable([tnd, v, sigma, a[0]])
    phi_pymc10 = at.as_tensor_variable([tnd, v, sigma, a[1]])
    phi_pymc11 = at.as_tensor_variable([tnd, v, sigma, a[1]])
    phi_pymc12 = at.as_tensor_variable([tnd, v, sigma, a[1]])
    phi_pymc20 = at.as_tensor_variable([tnd, v, sigma, a[2]])
    phi_pymc21 = at.as_tensor_variable([tnd, v, sigma, a[2]])
    phi_pymc22 = at.as_tensor_variable([tnd, v, sigma, a[2]])
    # use Potential to call the loglikelihood function
    pm.Potential("llh0", logl0(phi_pymc00, phi_pymc01, phi_pymc02))
    pm.Potential("llh1", logl1(phi_pymc10, phi_pymc11, phi_pymc12))
    pm.Potential("llh2", logl2(phi_pymc20, phi_pymc21, phi_pymc22))

Will calling three potential functions like this cause and issue, or is this the correct way to instantiate my model? It runs without issue, but I don’t know if there is a more correct way to run this. Below is the model graph for reference in case it is helpful.

Multiple Potentials should be fine but why are you repeating the same call to as_tensor_variables 3 times?

I don’t actually need to do that for this case. To test the model, I’m simulating data, and in the normal case, each of the individual as_tensor_variable callouts would have slightly different parameters. To simplify testing for the hierarchical version, I made them all the same, but never removed the extra callouts.

An additional question on top of this. Say I want to do this with 50 datasets. Thus, I would need 50 pm.Potential callouts. This becomes too time consuming to hard code, especially if I want a variable amount of data sets. I know that looping isn’t a great idea within aesara/pytensor, so can I use scan to replace it instead? What is the best way to go about replacing a loop of this kind:

for ii in range(50):
     pm.Potential('llh' + str(ii), logl(phi[ii]))

Can’t you make your potential accept higher dimensionality data? Then if things have the same shapes you could just pass the whole block of parameters instead of slicing it 50 times.

If scan is too difficult to use, I can do that. I’m running into a bit of a problem with the as_tensor_variable function when I do this. Ideally, I would like to stuff my phi_pymc variables into a list, then turn that list into a tensor. So, something like:

p0 = [tnd, v, sigma, a[0]]
p1 = [tnd, v, sigma, a[1]]
p2 = [tnd, v, sigma, a[2]]
phi_pymc = at.as_tensor_variable([p0, p1, p2])

This doesn’t work however because I can’t convert objects into tensor variables. I get the following error:

 TypeError: Unsupported dtype for TensorType: object

Is there a way to get around this, or do I just have to shove all priors into a single long list?

You can convert every item to a pytensor variable first and then stack them together (if they have compatible shapes).

You can also use a scan, nothing wrong with it. But when you can use single arrays and treat data in batches it’s usually more performant because of vectorization.

That of course depends on what you are doing in that Potential call

Converting them to pytensor variables and stacking worked great! The potential call is flexible, I can make that function do whatever it needs to now that I have more input input flexibility.

1 Like