Calculating WAIC/LOO on different size datasets

Hi there,

Suppose I have the following two datasets for a simple hierarchical linear regression with no intercept:

  • Dataset 1: X1, Y1
  • Dataset 2: X2, Y2

X1, X2, Y1 and Y2 are all scalars.

Dataset 2 is a transformed version of Dataset 1, where the X and Y values can be:

  1. Different
  2. It can also have a different number of observations (due to filtering or other things)

I am reasonably well-versed in using WAIC/LOO for model comparison on a fixed dataset, but would it ever make sense to use WAIC/LOO for model comparison when the datasets are different?

What I’m trying to understand is for which dataset a given specification of linear model is most likely to generalize - for my purposes, I can actually use either and recover the quantity of interest at the end.

For issue 2. I am aware that WAIC/LOO scale with dataset size, but would it be possible to divide by the sample size here to get a kind of normalized estimate? I think this is being done in section 9.3.1 of this book:

For issue 1. I am not sure.

Any help would be much appreciated!

I am quite sure the answer is no to both. See for example Cross-validation FAQ. You can compare transformed versions of the same dataset including the jacobian as shown in (also another example linked in the faq above) but not if there are different filtering applied to each model.