Calculating WAIC/LOO on different size datasets

cocodimama · August 20, 2023, 4:39pm

Hi there,

Suppose I have the following two datasets for a simple hierarchical linear regression with no intercept:

Dataset 1: X1, Y1
Dataset 2: X2, Y2

X1, X2, Y1 and Y2 are all scalars.

Dataset 2 is a transformed version of Dataset 1, where the X and Y values can be:

Different
It can also have a different number of observations (due to filtering or other things)

I am reasonably well-versed in using WAIC/LOO for model comparison on a fixed dataset, but would it ever make sense to use WAIC/LOO for model comparison when the datasets are different?

What I’m trying to understand is for which dataset a given specification of linear model is most likely to generalize - for my purposes, I can actually use either and recover the quantity of interest at the end.

For issue 2. I am aware that WAIC/LOO scale with dataset size, but would it be possible to divide by the sample size here to get a kind of normalized estimate? I think this is being done in section 9.3.1 of this book:

https://bookdown.org/marklhc/notes_bookdown/model-comparison-and-regularization.html

For issue 1. I am not sure.

Any help would be much appreciated!

OriolAbril · September 27, 2023, 7:49am

I am quite sure the answer is no to both. See for example Cross-validation FAQ. You can compare transformed versions of the same dataset including the jacobian as shown in https://oriolabrilpla.cat/en/blog/posts/2019/loo-cv-transformed-data.html (also another example linked in the faq above) but not if there are different filtering applied to each model.

Topic		Replies	Views
Comparing Models with WAIC Questions	7	3469	September 19, 2017
Is model with high or low WAIC and LOO better version agnostic arviz	1	1592	August 12, 2021
Model comparison for individual and combined datasets Questions	6	1209	August 3, 2022
Comparing Different Models on New Data Questions	1	449	September 2, 2018
LOO-CV for hierarcical model v5 modeling , arviz	2	418	April 4, 2023

Calculating WAIC/LOO on different size datasets

Related topics