Comparing Models with WAIC

Lets say i have two linear models, one is pooled and one is hierarchical.
I would like to use WAIC to select the model that is significantly better than the other at the 5% level.

From the WAIC function, i get the waic + waic standard error, which i could then turn into a 95% confidence interval (CI) using waic±1.96*waic_se and assuming waic is normally distributed. If the cofidence intervals do not overlap, then one is model is better than the other. However, if they do overlap, I cannot be so sure.

What is a better way than above to compare? I have a feeling its something to do with the pointwise predictions that waic can output.

waic = pm.stats.waic(trace1, model1, pointwise=True)

Could I compare these pointwise predictions from the two models, and then check if the 95%CI of this difference distribution crosses zero?


1 Like

Are you sure you want to do significant testing, at a level of 0.05? Half (at least) of the people here will scream in terror reading that part :wink:

There are many ways to do model comparison, and the current recommendations are prediction based (Posterior Predictive Checks pm.sample_ppc, cross-validation pm.loo). You can see more information here and here.

At the end, it kind of comes down to what are you going to do with the selected model. If you are going to use it for prediction, then there are better way to do so (model averaging). If you want to say that’s the model explaining the best of your data, then you can just describe the one with the least prediction error.

1 Like

I don’t know much about the context here, but in general I don’t think it is all that useful to talk about significant differences between models. You can try to build a larger model that incorporates both models as special cases and fit that instead (if that makes any sense in your situation).

1 Like

Yea i understand the arguments against significance testing. The reason i ask is i have a reviewer asking if the hierarchical version of the model is significantly better given its extra complexity (old habits die hard I guess…).

I’m using these model to explain the data, so I guess I will just use the one with the least prediction error.

If you feel like being evil, you could ask him if he could please define what he means by “significantly better” :innocent:


There must be a reference out there with a strongly worded argument against it…

1 Like

Maybe a little late, but I want to recommend you to read chapter 6 of Statistical Rethinking, in that Chapter Richard McElreath discuss the question How big a difference in WAIC is “significant”? Spoiler alert he says “In general, it is not possible to provide a principled threshold of difference that makes one model “significantly” better than another, whatever that means”.

You can use the function and pm.compareplot to present the result of your analysis and help analyze it. Both these function are based on the mentioned chapter. If you use with the argument method=BB-pseudo-BMA the standard errors of WAIC are computed using Bayesian Bootstrapping, this is a good idea given that WAIC distribution could be skewed. Also notice that pWAIC is and estimation of the effective parameters in your models, probably you should not take that number too seriously but I guess it can give you and idea of the flexibility/complexity of your model, that may help you discuss your results ans answer the reviewer. Notice that hierarchical models are less complex than they seems, this is related to the shrinkage effect.


Thanks @aloctavodia, i’ll give this a go.
Also, thanks for the chapter recommendation. Some good solid foundation in model comparison would help me a lot I think.

I really appreciate all the help from the pyMC_devs team, you guys have been very responsive and helpful :slight_smile: