Fitting a simple Reinforcement Learning model to behavioral data with PyMC (Jupyter NB)

Weird. I tried running with 4 cores yesterday and the computer still crashed due to the memory issue. For my current purpose, I’ll just reduce the number of cores. I’d love to know if this problem is fixed in the future! As for the “other issue”, what I meant was not the difference in the time expectation, but the number of chains appeared to have changed. For example, if I use 2 cores or above, I see 29/16000, but if I set ncores=1, it’s only 11/4000 which doesn’t look right. And there is no “Sampling 4 chains” but instead “Sampling chain 0”. I don’t think the number of chains depends on the number of cores. I don’t know if it’s only printing the wrong message, or somehow it’s only sampling one chain when I set core=1 even though the chain parameter is 4.

I don’t remember what the output with multiple chains and 1 core should be. You can check with a simple fast model to see if there’s a bug indeed. It shouldn’t have anything to do with your complex model.

1 Like

I tried a simple example and I know what the “problem” is. The model I used was taken from one of the pymc online tutorials:

RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")
with pm.Model() as model:
    mu = pm.Normal("mu", mu=0, sigma=1)
    obs = pm.Normal("obs", mu=mu, sigma=1, observed=rng.standard_normal(100))

If I sample with 4 chains 2 cores:
idata = pm.sample(2000, chains=4, cores=2)
It works fine as it’s sampling 4 chains in total 12000 sample (8000 + 3000 tunning):
[12000/12000 00:06<00:00 Sampling 4 chains, 0 divergences]
However if I sample 4 chains with 1 core (idata = pm.sample(2000, chains=4, cores=1)) I get:
[3000/3000 00:03<00:00 Sampling chain 0, 0 divergences]
so it’s only sampling one chain (3000 samples). However, this is being done 4 times (corresponding to 4 chains). e.g. [3000/3000 00:01<00:00 Sampling chain 1, 0 divergences] In summary, if you only use one core, then pymc only starts sampling another chain in a separate progress bar after it finishes sampling the previous chain. But if you use chores > 1, pymc only displays one progress bar which lumps all chains together. So I think there isn’t a bug after all. It’s just displayed differently.

I think now I have a good theory of why pm.DensityDist in my case required way more memory than pm.Potential. So to fit RL models, there are inputs of different types. For example, the reward may be a float but the action must be an integer. In your notebook example using pm.Potential, you first separated these inputs of different types before converting them into aesara tensor. However to use pm.DensityDist, I think it only takes one aesara object as the data input, which forces me to first convert all inputs of different types into one aesara tensor, and then later separate them and change them into the appropriate types. Maybe changing the type of aesara object requires additional memory.

DensityDist can receive multiple inputs, you don’t need to merge everything.

Bue yes that’s a plausible explanation for the differences you found.

1 Like