Divided by zero encountered when fitting a RL model

HuangHam · July 19, 2022, 9:41pm

I have been enjoying fitting RL model using pymc4 so far! However I just now encountered an issue that I have no idea how to fix. In my study, I aim to compare two RL models. RL1 is just the classic Q learning model. RL2 is pretty much the same except I added another parameter that dictates how my task condition may change reward representation. I first simulated data from the priors of RL1 and RL2, and then fit using pymc. Both did not give any error. However, when I fit them onto my real data, RL1 fit just fine, RL2 gives me the following error:

RuntimeWarning: divide by zero encountered in true_divide
RuntimeWarning: invalid value encountered in multiply
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV alpha_logodds__.ravel()[[0]] is zero.
The derivative of RV beta_log__.ravel()[[0]] is zero.
The derivative of RV decay_logodds__.ravel()[[0]] is zero.
The derivative of RV gamma.ravel()[[0]] is zero.

My first thought is that maybe the likelihood function of my RL2 did not succeed in dealing with missing data NaN. The real data has NaN but the simulated data obviously does not. Then I separately tried to run the likelihood function of RL2 on the real data with some hand-typed parameter values. Surprisingly the function ran just fine. Which means my likelihood function deals with NaN just fine. But since RL2 did not give any error on the simulation data, then the model specification should also be fine. Could anyone please tell me what I should do in this case? Is there a way for me to print aesara objects in real-time so I can trouble shoot exactly which stage went wrong?

ricardoV94 · July 20, 2022, 6:06am

I doubt your likelihood will handle nan out of the box, if you didn’t do anything specific tp account for them.

Can you show the complete code that reproduces the problem you are observing?

HuangHam · July 20, 2022, 12:36pm

I think I have a hypothesis. So in my likelihood function, there is a line:
utility_ = reward_*split_
All three are aesara tensors but reward_ may include NaN. Is it true that when I simply call the likelihood function, this line works fine because just like in numpy, NaN times anything will still give you NaN. However, pymc requires gradient computation from it, in which case NaN may break it. Is it true? If so, is there a NaN safe way to multiply two tensors?

ricardoV94 · July 20, 2022, 4:50pm

You have to decide what you want to do with the nan. Ideally you would not include them at all/ remove them during pre-processing

HuangHam · July 20, 2022, 5:33pm

I see I see. I need them as a placeholder of a trial because my model builds in a forgetting mechanism. If there isn’t any convenient aesara function, this workaround I think worked:

idx_noNaN_t, idx_noNaN_b, idx_noNaN_s = at.nonzero(~at.isnan(rewards_))
utility_ = rewards_.copy()
utility_ = at.set_subtensor(utility_[idx_noNaN_t, idx_noNaN_b, idx_noNaN_s], rewards_[
idx_noNaN_t, idx_noNaN_b, idx_noNaN_s] * split_[idx_noNaN_t, idx_noNaN_b, idx_noNaN_s])

Essentially I first extracted the indices of non-NaN elements, copys rewards_ into utility_, then use at.set_subtensor to reassign only the non-NaN elements into reward*split. In case other people ran into similar issues.

Topic		Replies	Views
Problems when fitting a reinforcement learning model to data Questions modeling	4	513	April 20, 2022
"ValueError: Mass matrix contains zeros on the diagonal" error, how to debug? Questions	7	1600	March 7, 2020
What is this about?: step_methods/hmc/quadpotential.py:224: Questions	5	664	July 28, 2021
Reinfocement learning model - derivative of RV is zero Questions	5	1764	April 10, 2019
Fitting a simple Reinforcement Learning model to behavioral data with PyMC (Jupyter NB) Sharing	27	3320	April 28, 2023

Divided by zero encountered when fitting a RL model

Related topics