Problems when fitting a reinforcement learning model to data

stackOverflow · April 3, 2022, 8:58pm

Hi all, I’m new to bayesian statistical modeling and trying to fit parameters of a reinforcement learning model using self-generated data. No matter how I change the prior, I keep running into this error message:

/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:224: RuntimeWarning: divide by zero encountered in true_divide
  np.divide(1, self._stds, out=self._inv_stds)
/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:203: RuntimeWarning: invalid value encountered in multiply
  return np.multiply(self._var, x, out=out)

What might be the possible reasons of these warning?
Below are my code of update function and model:

def update_RW(stim_matrix,shock_matrix,vS,combined_v,lr,no_subjects):
    """
    update function of Rescorla-Wagner model, will be used in pymc3 model estimation
    """
    pe = shock_matrix - vS[tt.arange(no_subjects), stim_matrix]
    # vS is a list of [[v for CS-, v for CS+],...] where shape = (no_subjects,2)
    vS = tt.set_subtensor(vS[tt.arange(no_subjects),stim_matrix], vS[tt.arange(no_subjects),stim_matrix] + lr * pe)

    #if stim = 1, append v for cs+, else append v for cs-
    combined_v = tt.set_subtensor(combined_v[tt.arange(no_subjects),0], (tt.switch(tt.eq(stim_matrix,1), 
                                                                vS[tt.arange(no_subjects),1], vS[tt.arange(no_subjects),0])))

    return vS, combined_v


with pm.Model() as RW_model:

    beta0 = pm.Normal('beta0',0, 1)
    beta1 = pm.Normal('beta1',0, 1)
    
    lr_hypermu = pm.Normal('lr_hypermu', 0,1)
    lr_hypersd = pm.HalfCauchy('lr_hypersd',beta=5)
    lr = pm.Bound(pm.Normal, lower=0.0, upper=1.0)('lr',mu=lr_hypermu, sigma=lr_hypersd, shape=no_subjects)
    
    vS = 0.5 * tt.ones((no_subjects,2), dtype='float64') 
    combined_v = 0.5 * tt.ones((no_subjects,1), dtype='float64') 

    outputs, updates = theano.scan(
        fn=update_RW,
        sequences=[stim_matrix, shock_matrix],
        outputs_info=[vS, combined_v],
        non_sequences=[lr, no_subjects])
    
    combined_v_ = outputs[1].reshape(est_scr_matrix.shape)
    est_scr_model = combined_v_* beta0 + beta1
    
    eps = pm.HalfNormal('eps', 5)
    scrs = pm.Normal('scrs',est_scr_model,eps,observed=est_scr_matrix)

    trace = pm.sample(target_accept=0.9, chains=4, cores=10, return_inferencedata=True)

fonnesbeck · April 4, 2022, 3:31am

Just to rule out the easy stuff first: do you have any missing values in your inputs?

stackOverflow · April 4, 2022, 10:05am

Thanks for your reply!
There is no missing value in my inputs. For the ease of reading, I only include first 5 elements if its length is greater than 5. Below are the inputs I checked:

> combined_v.eval()
array([[0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5]])
> vS.eval()
array([[0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5]])
> stim_matrix[:5].eval()
array([[1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0]])
> shock_matrix[:5].eval()
array([[1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0]])
> no_subjects
5
> est_scr_matrix[:5]
array([[-1.9896787 , -2.27735786, -2.07463294, -2.20171174, -2.19734734],
       [-0.49669095, -0.20901179, -0.41173671, -0.28465791, -0.28902232],
       [-0.75986582, -0.31556975, -0.64359855, -0.4463102 , -0.4535504 ],
       [-1.81928559, -2.1817793 , -1.90742927, -2.06732156, -2.06118915],
       [-0.23351609, -0.10245383, -0.17987487, -0.12300563, -0.12449423]])

>

I forget to include one information in my original post: these error message appear around 33%-45% of the progress bar .


33.65% [2692/8000 1:47:22<3:31:43 Sampling 4 chains, 0 divergences]

Edit:
Instead of the prior in the original post, I’ve tried to fit with the simplest prior I could think of(more details in below code block). I got the same error messages.

 beta0 = pm.Normal('beta0',0,1)
 beta1 = pm.Normal('beta1',0,1)
 lr = pm.Beta('lr', 1,1, shape=no_subjects)

I’ve also checked my update function in below setting:

# test the functionalities of theano.scan and theano.function with fn = update_RW
lr_ = np.random.uniform(low=0.0, high=1.0, size=no_subjects)
lr = tt.vector('lr')
vS = 0.5 * tt.ones((no_subjects,2), dtype='float64') 
combined_v = 0.5 * tt.ones((no_subjects,1), dtype='float64') 

outputs, updates = theano.scan(
    fn=update_RW,
    sequences=[stim_matrix, shock_matrix],
    outputs_info=[vS, combined_v],
    non_sequences=[lr, no_subjects])

op = theano.function(
  inputs=[lr],
  outputs= outputs
)

oval= op(lr_)

> oval[1].reshape(est_scr_matrix.shape)[:5]
array([[0.82372604, 0.94848156, 0.86056746, 0.91567671, 0.91378403],
       [0.17627396, 0.05151844, 0.13943254, 0.08432329, 0.08621597],
       [0.29040291, 0.09772858, 0.23998221, 0.15442574, 0.15756555],
       [0.74983301, 0.90703277, 0.78805758, 0.85739679, 0.8547374 ],
       [0.06214502, 0.0053083 , 0.03888286, 0.01422083, 0.01486639]])

stackOverflow · April 8, 2022, 4:57pm

with pm.Model() as RW_model:
    ##############################################
    beta0 = pm.Normal('beta0',0,1)
    beta1 = pm.Normal('beta1',0,1)
    ##############################################
    lr = pm.Beta('lr', 1,1, shape=no_subjects)
    ##############################################
    
    # step function
    vS = 0.5 * tt.ones((no_subjects,2), dtype='float64') 
    combined_v = 0.5 * tt.ones((no_subjects,1), dtype='float64') 

    outputs, updates = theano.scan(
        fn=update_RW,
        sequences=[stim_matrix, shock_matrix],
        outputs_info=[vS, combined_v],
        non_sequences=[lr, no_subjects])
    
    combined_v_ = outputs[1].reshape(est_scr_matrix.shape)
    est_scr_model = combined_v_*beta1 +beta0
    ##############################################
    # Model error
    eps = pm.HalfNormal('eps', 0.5)
    
    # Data likelihood
    scrs = pm.Normal('scrs',est_scr_model,eps,observed=est_scr_matrix)
    ##############################################
    
    trace = pm.sample(target_accept=0.9, chains=4, cores=10, return_inferencedata=True)

The snippet above contains the simplest version of my model. I’ve narrowed down the problem to the beta0 parameter and the equation est_scr_model = combined_v_*beta1 +beta0. The divide by zero warning appears when the following exception is thrown:

".../pymc3/step_methods/hmc/quadpotential.py", line 272, in raise_ok
    raise ValueError("\n".join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `beta0`.ravel()[0] is zero.

It looks like something is wrong when quadpotential is performed to calculate beta0. Strangely, the problem is gone if I only keep one of the betas, i.e. making the equation either est_scr_model = combined_v_*beta1 or est_scr_model = combined_v_ + beta0 solves the problem.

What should I do if I want to keep both betas?

cluhmann · April 20, 2022, 9:22pm

I can’t quite get your code to work. The call to tt.scan() is throwing an error, but I suspect that because I had to cobble a script together from your multiple posts. Would it be possible to provide a single, self-contained block of code that runs?

Topic		Replies	Views
Reinfocement learning model - derivative of RV is zero Questions	5	1767	April 10, 2019
Divided by zero encountered when fitting a RL model v5 modeling , aesara	4	563	July 20, 2022
Reinforcement learning - help building a model Questions scan_ops , large_model	16	1346	June 3, 2022
Behrens' Bayesian Learner Model : How to share parameters between steps? Questions	44	2228	January 7, 2022
Modeling reinforcement learning of human participant using PyMC3 Questions	8	3193	June 7, 2023

Problems when fitting a reinforcement learning model to data

Related topics