LOO-CV for hierarcical model

Hi all

I would like to perform validation of a hierarchical model using LOO-CV. I would like to estimate the validation metric for both leaving out only one observation and leaving out an entire group of observations (LOGO-CV).

Is this possible using arviz? If not, what is your suggested approach?

Thank you in advance.

Hi! You can choose between loo/logo with the log likelihood data you give to loo or waic functions.

One quick example using the radon dataset (comes with ArviZ). We have 919 observations from a total of 85 counties, each county has a different number of observations.

import arviz as az
idata = az.load_arviz_data("radon")
az.loo(idata, var_name="y")
Computed from 2000 posterior samples and 919 observations log-likelihood matrix.

         Estimate       SE
elpd_loo -1027.18    28.85
p_loo       26.82        -
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.5]   (good)      919  100.0%
 (0.5, 0.7]   (ok)          0    0.0%
   (0.7, 1]   (bad)         0    0.0%
   (1, Inf)   (very bad)    0    0.0%

with this we perform leave one out, as the original pointwise log likelihood data is for this. See how the loo output says 2000x919. If instead we compute probabilities for whole groups, we get:

idata.log_likelihood["c"] = idata.log_likelihood.y.groupby(idata.constant_data["county_idx"]).sum()
Computed from 2000 posterior samples and 85 observations log-likelihood matrix.

         Estimate       SE
elpd_loo -1028.21   183.50
p_loo       24.16        -

There has been a warning during the calculation. Please check the results.
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.5]   (good)       60   70.6%
 (0.5, 0.7]   (ok)         18   21.2%
   (0.7, 1]   (bad)         7    8.2%
   (1, Inf)   (very bad)    0    0.0%
1 Like

Thank you! This approach solves my problem.