Model comparison for individual and combined datasets


So this is half a stats question, half a PyMC3 question. Let’s say I have data from 5 participants. And let’s say I have 3 models that I want to evaluate (although the actual numbers in my context are much higher).

Analysis 1
I might be quite interested in fitting each of the 3 models to each of the 5 participants entirely independently. That’s fine, we can do that, and use the WAIC/LOO model comparison to come up with the best model for each participant separately. This is fine.

Analysis 2
Now let’s say I want to know which is the best of my 3 models to account for all participants. The question is what is the best way to go about this?

Solution A
The first solution would be to create a new set of PyMC3 models which iterates over participants, still treating them as statistically independent. Then we could just do one model comparison and that will give us the answer. However, this is perhaps not a great solution because a) we have to write a new set of PyMC3 models, b) these models are now potentially very large if we have many participants and data samples per participant, c) the dimensionality of the posterior grows very large.

Solution B?
So… my thought was to combine the log likelihood values obtained from each individual ‘fit’ to each model x participant combination? So we pool the log likelihood values for all participants for a given model, obtaining a larger set of log likelihood values for each model… Then do one single WAIC/LOO model comparison on that?

  • Is this statistically kosher? (I think it is)
  • How might we go about doing this? My feeling is that it would require changes to the function.

It could be that this is a bit niche, but I personally think that there would be a number of use cases for this and so could potentially merit looking in to.

Solution C
Or is it possible to simply sum some of the WAIC/LOO deviances across models after having calculated them for each participant/model combination as in Analysis 1?


Why not fit 3 large hierarchical models for all subjects, you have appropriate pooling and regularization of parameters from hyperpriors. It is similar to your solution A but not treating each participant independent.

I guess your solution B and C should work as well as an approximation; if all 3 models are equally likely a prior. Some refactoring of the compare function is needed, but the aim is to compare the expected log pointwise predictive density (elpd). In regular use case when you are comparing 3 models independently for each participant, elpd is the same size as the number of observation of that participant (say n trial). So if you are concatenating all subjects together and treating them as one, elpd of each model is now the same size as n_trial*n_subject. I have this work in progress function that might help you a bit.

I think this is valid to do at least as an approximation, but you really should read and do some math/simulation to make sure.


Thanks for that.

The motivation for avoiding a single model (with or without hyperparameters) was to be able to deal with situations where you might have ~100 data points per participant, and possible 1000’s of participants. If each model has 3 parameters (for example) then are ~3000 parameter models feasible to estimate in PyMC3?

I’m happy to implement single models which fit all participants (solution A), just that for my analysis plan I’d want to look at both individual fits and collective fits, just it would be slightly simpler if I could do that with only one single-participant set of models. But it looks like Solution A is going to be the simplest for the moment… implement single participant models, and multiple participant equivalents (either hierarchical or not).


Well, it is possible if it is not a Gaussian process :joy: (I mean it is possible with GP as well, just that it might be really slow…)