Leave on group out cross validation

In a hierarchical model where model comparison is being done on the likelihood of a new group, the recommendation is to do “leave one group out” cross validation. It would be cool if arviz could do this automagically, but that’s not really something for this forum.

What I’d like to know is if there is an example notebook or sample code for doing leave one group out cross validation. I’d like to see it done once before I try to figure it out myself.

Thanks,
Opher

1 Like

I have a rough example of LOGO at Jupyter Notebook Viewer. If you are interested we could also look into writing some docs for this.

I would love to work with you on making documentation that explains how to do this, perhaps working an example from linear regression. Tell me how to get the ball rolling.

Opher

The first step is to identify some model (or models if instead of stopping on computing logo we want to do model comparison)+data that would be suited for this. It needs to be a hierarchical model for LOGO to make sense. Ideally it would be a case where both loo and logo make sense and are useful. After that write a notebook that computes LOGO using PSIS for that model following the approach outlined in the notebook above.

Well, this has come up for me before, but the most recent example was a question that McElreath put in his homework on model comparison and he also uploaded a worked solution in Stan. As such, it might be interesting to contribute a different perspective.

Here is the homework. The last (optional challenge) question: stat_rethinking_2022/week04.pdf at main · rmcelreath/stat_rethinking_2022 · GitHub

Here is the posted solution: stat_rethinking_2022/week04_solutions.pdf at main · rmcelreath/stat_rethinking_2022 · GitHub

Opher

The dinosaur model sounds good, the only potential issue I see for this use-case which is mostly illustrative is that I would expect the loo and logo comparisons to sort the models in roughly equivalent if not equivalent manner.

Do you know if there is some species level information we could try to incorporate to the model? Maybe modulating the time before growth accelerates as a function of the maximum mass? Or a 2nd hierarchy level separating between herbivore and carnivore? (note I haven’t looked at the data yet, just would like to ideally have different elpd results for loo and logo approaches)

That’s interesting. Maybe you can explain why you think they would be the same.

In my head, in a hierarchical model, the loo will measure how well the model fits the data, given the dinosaur, while the logo measures how well you can predict the next dinosaur given the ones you have.

In any case, the dataset doesn’t contain any other characteristics. The interesting thing about the dataset is that it is very small: 6 species of dinosaur and 32 data points in all. I thought this might be useful because it would allow us to practically compare importance sampling approximations to actual loo and logo based on re-fitting the models.

The data set is also interesting in that the dinosaurs are really very different sizes spanning a scale of masses that covers 3 orders of magnitude. Thus, taking out one species will affect the hierarchical parameters quite profoundly.

Opher

Having little data might do the trick, that is right.

This is one of the main issues with loo and waic. They never really measure how well the model fits the data. They are a predictive accuracy measure (after all, they are estimators of the elpd, expected log predictive density).

In a hierarchical model, loo evaluates the predictive power over single observations, in this case that would be predicting the mass of a dinosaur given its species and age (for a known species only also). Whereas logo would evaluate the predictive power on making predictions for a new species. In my limited experience with logo, unless there is some type of group level information, the result is basically the same. And you can also estimate the elpd for loo using k-fold cross validation, so in cases where logo and k-fold can be interchangeable the result should actually be the same.

Maybe it’s time to take this offline and come back when we have something clear and helpful to say. I sent you a direct message.

Opher

1 Like