Calculating WAIC/LOO on different size datasets

Hi there,

Suppose I have the following two datasets for a simple hierarchical linear regression with no intercept:

  • Dataset 1: X1, Y1
  • Dataset 2: X2, Y2

X1, X2, Y1 and Y2 are all scalars.

Dataset 2 is a transformed version of Dataset 1, where the X and Y values can be:

  1. Different
  2. It can also have a different number of observations (due to filtering or other things)

I am reasonably well-versed in using WAIC/LOO for model comparison on a fixed dataset, but would it ever make sense to use WAIC/LOO for model comparison when the datasets are different?

What I’m trying to understand is for which dataset a given specification of linear model is most likely to generalize - for my purposes, I can actually use either and recover the quantity of interest at the end.

For issue 2. I am aware that WAIC/LOO scale with dataset size, but would it be possible to divide by the sample size here to get a kind of normalized estimate? I think this is being done in section 9.3.1 of this book:

https://bookdown.org/marklhc/notes_bookdown/model-comparison-and-regularization.html

For issue 1. I am not sure.

Any help would be much appreciated!

I am quite sure the answer is no to both. See for example Cross-validation FAQ. You can compare transformed versions of the same dataset including the jacobian as shown in https://oriolabrilpla.cat/en/blog/posts/2019/loo-cv-transformed-data.html (also another example linked in the faq above) but not if there are different filtering applied to each model.