I have two questions. The first one is related to the evaluation of models that are not predictive. I am interested on fitting a GLM (Poisson) model to describe the coefficient of the covariates that are correlated with my dependent variable. Thus, I am not interested in out-of-sample predictions.
I’ve seen many papers using DIC as an evaluation metric (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282098/). In the pymc3 github I also read about many people suggesting WAIC as an evaluation metric. After reading this paper (“Understanding predictive information criteria for Bayesian models” https://link.springer.com/article/10.1007/s11222-013-9416-2) my questions are:
Isn’t WAIC “just” related to out-of-sample/generalization prediction? So does it make sense to use it in my project?
What should I do when I have these warnings? Can I solve them somehow? (not clear to me how). If not, can I trust WAIC?
/home/nadai/.local/lib/python3.4/site-packages/pymc3/stats.py:213: UserWarning: For one or more samples the posterior variance of the
log predictive densities exceeds 0.4. This could be indication of
WAIC starting to fail see http://arxiv.org/abs/1507.04544 for details
WAIC WAIC_r(WAIC=158.42521828487168, WAIC_se=2.3092993247226805, p_WAIC=7.2801294)
/home/nadai/.local/lib/python3.4/site-packages/pymc3/stats.py:278: UserWarning: Estimated shape parameter of Pareto distribution is
greater than 0.7 for one or more samples.
You should consider using a more robust model, this is
because importance sampling is less likely to work well if the marginal
posterior and LOO posterior are very different. This is more likely to
happen with a non-robust model and highly influential observations.
happen with a non-robust model and highly influential observations.""")
LOO LOO_r(LOO=164.91826, LOO_se=2.4069428499178245, p_LOO=10.526650309191908)
Are papers using DIC in hierarchical models all wrong?
Thanks your answer was very complete! Especially the second point is important to me.
What about WAIC/DIC in presence of random effects such as in the Conditional Autoregressive (CAR) model? If I compute the DIC/WAIC on i) a model with just the random effect variables and ii) a model with random effects AND covariates, I always get lower or equal DIC/WAIC for the former.
How do you interpret these results? I think the random effects get confusion on the metrics, don’t they? Do you have any experience on this?
I ask this because this is one of my source of confusion about the evaluations.
You are welcome.
I am not familiar with CAR model (I will read about it!), but if you can consider you data points as independent then you can use WAIC. I think Gelman has an example where they have a time series, but for their purpose they model them as a simple linear regression and hence they can apply WAIC.
I am trying to understand the limitations of WAIC so if I find something I will let you know. Thanks for your question.
I used WAIC and LOO, both of them occurred the warning,
WAIC: UserWarning: For one or more samples the posterior variance of the log predictive densities exceeds 0.4. This could be indication of WAIC starting to fail.
And LOO: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
That means my model is not good, can I use the WAIC and LOO as the referrence of model selection?
It means that you have at least one observation that may be problematic. This could be a problem in your data, like an input mistake (you wrote 200, instead of 2), or more generally the model is not able to actually model the observation(s). For example you are modeling count data using a Poisson distribution, but your data is overdispersed so a NegativeBinomial will be probably a better idea. To help diagnose the problem you can use LOO and functions like arviz.plot_khat — ArviZ dev documentation and arviz.plot_elpd — ArviZ dev documentation
If you are seeing those warnings, it means that the approximations used to compute WAIC and LOO may not be reliable. So it is better to solve those problems. Alternative you can use az.LOO to get the ELPD of the non-problematic observations (k hat <0.7) and then explicitly compute the ELPD for the problematic observations (k hat > 0.7) by refitting the model and actually leaving one observation out. Of course this is only a good idea if you just have a few points, otherwise the cost of refitting the model many times will be too expensive. You can read more about this here Articles • loo
I tried to plot the khat and the result is as shown. only one datapoint is over 0.7.