So this is half a stats question, half a PyMC3 question. Let’s say I have data from 5 participants. And let’s say I have 3 models that I want to evaluate (although the actual numbers in my context are much higher).

**Analysis 1**

I might be quite interested in fitting each of the 3 models to each of the 5 participants entirely independently. That’s fine, we can do that, and use the WAIC/LOO model comparison to come up with the best model for each participant separately. This is fine.

**Analysis 2**

Now let’s say I want to know which is the best of my 3 models to account for *all* participants. The question is what is the best way to go about this?

**Solution A**

The first solution would be to create a new set of PyMC3 models which iterates over participants, still treating them as statistically independent. Then we could just do one model comparison and that will give us the answer. However, this is perhaps not a great solution because a) we have to write a new set of PyMC3 models, b) these models are now potentially very large if we have many participants and data samples per participant, c) the dimensionality of the posterior grows very large.

**Solution B?**

So… my thought was to combine the log likelihood values obtained from each individual ‘fit’ to each model x participant combination? So we pool the log likelihood values for all participants for a given model, obtaining a larger set of log likelihood values for each model… Then do one single WAIC/LOO model comparison on that?

- Is this statistically kosher? (I think it is)
- How might we go about doing this? My feeling is that it would require changes to the
`pm.compare`

function.

It could be that this is a bit niche, but I personally think that there would be a number of use cases for this and so could potentially merit looking in to.

**Solution C**

Or is it possible to simply sum some of the WAIC/LOO deviances across models after having calculated them for each participant/model combination as in Analysis 1?

Why not fit 3 large hierarchical models for all subjects, you have appropriate pooling and regularization of parameters from hyperpriors. It is similar to your solution A but not treating each participant independent.

I guess your solution B and C should work as well as an approximation; if all 3 models are equally likely a prior. Some refactoring of the compare function is needed, but the aim is to compare the expected log pointwise predictive density (elpd). In regular use case when you are comparing 3 models independently for each participant, elpd is the same size as the number of observation of that participant (say n trial). So if you are concatenating all subjects together and treating them as one, elpd of each model is now the same size as `n_trial*n_subject`

. I have this work in progress function that might help you a bit.

I *think* this is valid to do at least as an approximation, but you really should read https://arxiv.org/abs/1704.02030 and do some math/simulation to make sure.

Thanks for that.

The motivation for avoiding a single model (with or without hyperparameters) was to be able to deal with situations where you might have ~100 data points per participant, and possible 1000’s of participants. If each model has 3 parameters (for example) then are ~3000 parameter models feasible to estimate in PyMC3?

I’m happy to implement single models which fit all participants (solution A), just that for my analysis plan I’d want to look at both individual fits and collective fits, just it would be slightly simpler if I could do that with only one single-participant set of models. But it looks like Solution A is going to be the simplest for the moment… implement single participant models, and multiple participant equivalents (either hierarchical or not).

Well, it is possible if it is not a Gaussian process (I mean it is possible with GP as well, just that it might be really slow…)

Hi there,

Thank you both for the clear discussion, which has been very helpful and educational. I have a very naive question related to the same scenario (something like “analysis 2” in the original post):

How meaningful would it be to rely on the multiplication of WAIC/LOO probabilities (weights) for each model across all participants to assess which model is better across all participants?

then are ~3000 parameter models feasible to estimate in PyMC3?

Sure. I once fitted a model to with data from ~9000 individual participants, each had several parameters (at least 3, I think). You just have to put them all in the same vector and use matrix algebra to map them to the correct values in your dependent variables.

just it would be slightly simpler if I could do that with only one single-participant set of models.

Perhaps I’m wrong about this, but if the model is non-hierarchical and one participant’s parameters are independent from another participant’s parameters then fitting one big model is equivalent to fitting many smaller models. I think that WAIC/LOO values are given elementwise (e.g., per trial in an experiment). I had always assumed that from the big model you can get the WAIC/LOO for the smaller models simply by summing the elementwise values over the appropriate individual elements, but perhaps this is not true?

Obviously, this is not true if the big model is hierarchical, because due to partial pooling you will end up with different parameter estimates. However I think it is still theoretically valid to carve up the big model; you simply used to prior information to get better parameter estimates in the individual models.

Hi, I kind of have an opposite question. I fit a model hierarchically, sampling priors for all three parameters for all subjects from the same set of priors. I end up getting one waic score per model for all subjects for model comparison. I also hope to compare models for each individual subject. Is there a way for me to do it without having to fit the model for each individual subject again?

```
with pm.Model() as m:
alpha = pm.Beta('alpha', alpha=1, beta=1, shape=n_subj)
beta = pm.Gamma('beta', alpha=3, beta=1/2, shape=n_subj)
decay = pm.Beta('decay', alpha=2, beta=15, shape=n_subj)
param = at.as_tensor_variable([alpha, beta, decay], dtype='float64')
like = pm.DensityDist('like', param, logp=agent.aesara_llik_td,observed=data_vec)
```