Hi I’m trying to run a parameter recovery for various types of Q-learning based models (similar to the ones in the pymc tutorial).
I created a colab notebook with a working example for a Q-learning model with a learning rate, a softmax parametrised by an inverse temperature and a decay parameter.
The model estimates group and subjects parameters for 50 subjects using non-centered parametrisation.
Everything seems to be working fine but I consistently get very precise estimates for the third parameter as shown in the plots below.
Learning rate and temperature recover well, while decay is tightly packed around the same value.
I tested similar models with lapse, associability weight, choice kernels and more parameters.
I consistently get good estimates for the learning rate and inverse temperature, but the estimates for the other parameters are always recovered in a very small range.
I tried to vary the sampler (pymc and numpyro), methods (variational inference), number of trials, number of subjects, re-parametrise some of the models but nothing seems to be working.
When I had similar issues in the past using MLE/MAP, it was usually due to the recovery process introducing correlations between the recovered parameters, but here it doesn’t seem the case (plot on colab).
From other studies, I know these models should be able to recover their parameters quite well (r > 0.7 or more) on these types of tasks/number of subjects.
Is there anything I’m missing in the way I specify my model?
Or maybe, is there any obvious way to prevent/reduce this issue or to debug it?
I guess my first suggestion would be to “unbuild” the hierarchy and try to run the model with a single subject-worth of data and see where things stand at that point. I find debugging more complex models difficult and often find that simplifying helps. Have you tried that?
I’ve tried by repeating the recovery for each subject individually, and with centered parametrisation (learning rate ~ Beta(1,1), inverse temperature ~ HalfNormal(sigma=10), decay ~ Beta(1,1)) and things improve a bit.
I also tried by increasing the number of trials from 80 to 200 (maintaining reversals every 20 trials) and the difference is not massive - pearson r for decay goes from 0.52 to 0.62.
I changed the distributions from which I sample the simulation parameters to be more precise (in particular for decay, which I now changed to Beta(0.5,1.5) as most of the decay parameters should be small) and it looks much better (single subject fitting, centered parametrisation, 80 trials).
Based on this, I guess it is an issue with the model not behaving well across the entire parameter space and requiring stronger priors on its parameters.
Could this make sense?
In case, what would be the most sensible way to adapt the Beta(0.5,1.5) to work with a non-centered parametrisation?
PyMC’s beta distribution make a mu/sigma parameterization availble (an alternative to the standard a/b parameterization) which should allow you to use a relatively standard non-centered set-up (i.e., subjects inherit their mus from a hyper-parameter, etc.). Does that make sense?
Thanks! I will have a look at that parametrisation for beta.
Some of the parameters I’m interested in don’t have “obvious” distributions like the decay example above and can be more spread out across their entire possible range, similarly to the learning rate which can take any value in [0,1].
Is there anything else that could be done to avoid the estimates of some parameters to converge around the same value when using non-centered parametrisation?
I’m trying to look for examples of similar issues but I can’t find much.
Any ideas/pointers would be greatly appreciated
If you think it’s the hierarchy that is driving these pathologies, I would encourage your to investigate the sampling itself and not necessarily the accuracy of your parameter recovery. For example, the decay parameter may not be all that identifiable and may exhibit strong dependencies with other model parameters. This notebook (which you may be familiar with) provides some examples of how to do this in the case of hierarchical models. On top of that, I would not assume that a non-centered parameterization will necessarily be better. It is not a universal truth that non-centered is superior to centered parameterizations.
Thank you!
I will check that notebook out, I currently don’t have divergences when sampling but hopefully there’s something I can use to fix my current issues.
Hi,
I had a look at the Betancourt’s tutorial and I’ve tried to plot the pair plots between the different parameters I’m estimating (I’ve updated the colab notebook above).
This is an example for one subject, focusing on the decay parameter
I cannot see anything obviously wrong but maybe I’m missing something.
I also looked at the likelihood space and it looked ok.
Is there anything else that I could check to debug this model, or is it just misspecified?
Also, does the scale of the parameters have an effect on the quality of the sampling?
Beta is [0,20ish], while alpha and decay in [0,1], could that be an issue or MCMC samplers are ok at handling that?
Is the model failing to converge, as in the 50% posterior HDI clearly does not cover the true parameter 50% of the time or just slow to converge as in the HDI is just so wide all the time?
The second could just mean you don’t have enough data relative to the prior uncertainty to inform the posterior as much as you would like. You can evaluate that by increasing the data size / reducing noise.
Convergence (the first problem) may also be a function of data signal, in that a model will do very poorly for some regimes of data but okay for others.
It’s more similar to the first case, I simulate choice data sampling the decay parameter sampled from a Uniform~[0,1] and all the estimates I recover are around 0.5 with very small variance.
Whereas the learning rates’ and inverse temperatures’ recovered estimates have distributions that closely match the ones I use to simulate the choice data.