pm.Potential with multiple data points

HuangHam · December 19, 2021, 1:30am

Hi, so I have a black box function agent.nLLH which takes in two arguments: param (a list of parameter values) and subj.data which is a numpy matrix encoding learning data from 1 person. There are nSub = 94 many people. agent.data is a list of 94 subj.data. I want to fit to each person a set of parameters while impose pre-specified prior at the population level over all 94 people. Here is the code I have which doesn’t work. I wonder where I messed up the syntax.

with pm.Model() as model:
learn_rate = pm.Beta(“learn_rate”, alpha=0.007, beta=0.018, shape=nSub)
noise = pm.Uniform(“noise”, lower=0, upper=1, shape=nSub)
temperature = pm.Gamma(“temperature”, mu=4.83, sigma=0.73, shape=nSub)
param = [learn_rate, noise, temperature]
like = pm.Potential(‘like’,-agent.nLLH(param, agent.data))
idata = pm.sample(cores=4, chains=2, return_inferencedata=True)

ricardoV94 · December 19, 2021, 4:45pm

What errors are you getting? Is your function implemented with theano operators instead of numpy/python functions?

HuangHam · December 19, 2021, 4:56pm

It’s some error within the agent.nLLH function. Something like matrix size doesn’t match. No I did not use theano. Just numpy/python. Is it required to use theano?

ricardoV94 · December 19, 2021, 7:52pm

Yes, or at least wrap your blackbox function in a Theano Op. Can you share the exact error message?

HuangHam · December 19, 2021, 8:06pm

The error message is like this:
File “/Users/honeybakedham/Box/SPRL_v3_project/Analysis/Modeling/running_station/mcmc.py”, line 21, in
like = pm.Potential(‘like’,-agent.nLLH(param, agent.data))
File “/Users/honeybakedham/Box/SPRL_v3_project/Analysis/Modeling/running_station/src/agents.py”, line 129, in nLLH
Qstim = Q[idx, stim]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2376,) (2376,14)

Would pm.DensityDist spare me from using Theano? I wanna see if there is a way of avoiding rewriting agent.nLLH. Also I heard in the upcoming pyMC4, theano will no longer be compatible?

HuangHam · December 19, 2021, 8:07pm

If it helps, this is the content of agent.nLLH:

model parameters and data indices

    Param = {self.parameter_names[i]: param[i] for i in range(self.numParam)}
    s = self.colnamesOut.index('stim')
    r = self.colnamesOut.index('reward')
    a = self.colnamesOut.index('action')
    # prep work
    llh, uni_p = np.zeros(self.num_block), 1 / self.na  # define uniform policy for later use
    data_block, Q = np.array(data_subj), np.full((self.num_block, self.ns, self.na), np.mean(self.rewards))
    for row in range(data_block.shape[1]):  # for each row of this particular block
        data_row = data_block[:,row]  # all the data for this row
        idx = np.argwhere(~pd.isnull(data_row).any(axis=1)).flatten()  # find blocks where this row does not contain nan
        if idx.size == 0: #if all blocks contain nan then skip row
            continue
        reward, stim, action = data_row[idx, r], data_row[idx, s].astype(int) - 1, data_row[idx, a].astype(int) - 1  # info of the current trial
        Qstim = Q[idx, stim]
        # policy layer (compute likelihood)
        lh = policy_backward(Qstim, action, Param['temperature'], Param['noise'], uni_p)
        if np.any(lh==0):#if likelihood == 0
            return np.inf #the whole data is impossible by this model
        llh[idx] += np.log(lh)  # np.where(lh != 0, np.log(lh), 0) # update log likelihood
        # update layer
        q_new = update_forward(Qstim[np.arange(len(idx)), action], reward, Param['learn_rate'])
        Q[idx, stim, action] = q_new
    return -np.sum(llh)

ricardoV94 · December 19, 2021, 8:54pm

You seem to have a basic shape broadcast issue there Doing something like Qstim = Q[idx[:, None], stim], might help but you really need to check if things are broadcasting as expected.

Densitydist also requires a theano function. Most numpy functions have a numpy counterpart, and if your function does not have compicated branch logic it is usually just a question of replacing the numpy functions by theano ones.

Otherwise you can check if the approach in this notebook makes sense for you: https://docs.pymc.io/projects/examples/en/latest/case_studies/blackbox_external_likelihood_numpy.html

ricardoV94 · December 19, 2021, 8:56pm

About the next PyMC version, it wont use Theano anymore but a fork of ours Aesara which keeps most of this API and requirements the same. So you will need the same type of solution

This might give some context: the PyMC3 and Theano — PyMC3 3.11.4 documentation

HuangHam · December 20, 2021, 2:55am

I see. Interesting. This nLLH function runs just fine on its own. I start my project using scipy minimizer function to do maximum likelihood estimation which worked fine. But now I think Bayesian is the answer hence pyMC. The theano issue aside, is the syntax right for pm.potential? The notebook doesn’t seem to have examples where priors have a shape larger than one.

HuangHam · December 22, 2021, 5:24pm

Does using a Theanos wrapper, like in that tutorial, make the code run slower in comparison to rewriting the likelihood function using Theano? Or the speed is similar?

ricardoV94 · December 22, 2021, 5:40pm

It really depends. Some theano code can be pretty fast after optimization and compilation, other times a black box function implemented in C/Numba/Jax can be greatly superior.

If you havent done anything special, my bet would be the Theano version would be faster and more numerically stable than your Python implementation.

Another advantage of using Theano is that you don’t have to implement your own gradient computations, as you get that for free with Theano’s autodiff

Topic		Replies	Views
pm.Potential without Theano? Questions	14	2304	March 20, 2018
Defining a numeric (custom) likelihood function in PyMC3 Questions theano	2	3070	August 25, 2018
Regarding likelihood evaluation in case of a blackbox likelihood Questions theano	4	766	October 9, 2020
Using a “black box” likelihood function with multilevel modeling Questions theano	1	992	April 17, 2020
Using pm.DensityDist and customized likelihood with a black-box function Questions	10	6867	August 29, 2018

pm.Potential with multiple data points

model parameters and data indices

Related topics