Lets say i have two linear models, one is pooled and one is hierarchical.
I would like to use WAIC to select the model that is significantly better than the other at the 5% level.
From the WAIC function, i get the waic + waic standard error, which i could then turn into a 95% confidence interval (CI) using waic±1.96*waic_se and assuming waic is normally distributed. If the cofidence intervals do not overlap, then one is model is better than the other. However, if they do overlap, I cannot be so sure.
What is a better way than above to compare? I have a feeling its something to do with the pointwise predictions that waic can output.
waic = pm.stats.waic(trace1, model1, pointwise=True)
Could I compare these pointwise predictions from the two models, and then check if the 95%CI of this difference distribution crosses zero?
Are you sure you want to do significant testing, at a level of 0.05? Half (at least) of the people here will scream in terror reading that part
There are many ways to do model comparison, and the current recommendations are prediction based (Posterior Predictive Checks
pm.loo). You can see more information here and here.
At the end, it kind of comes down to what are you going to do with the selected model. If you are going to use it for prediction, then there are better way to do so (model averaging). If you want to say that’s the model explaining the best of your data, then you can just describe the one with the least prediction error.
I don’t know much about the context here, but in general I don’t think it is all that useful to talk about significant differences between models. You can try to build a larger model that incorporates both models as special cases and fit that instead (if that makes any sense in your situation).
Yea i understand the arguments against significance testing. The reason i ask is i have a reviewer asking if the hierarchical version of the model is significantly better given its extra complexity (old habits die hard I guess…).
I’m using these model to explain the data, so I guess I will just use the one with the least prediction error.
If you feel like being evil, you could ask him if he could please define what he means by “significantly better”
There must be a reference out there with a strongly worded argument against it…
Maybe a little late, but I want to recommend you to read chapter 6 of Statistical Rethinking, in that Chapter Richard McElreath discuss the question How big a difference in WAIC is “significant”? Spoiler alert he says “In general, it is not possible to provide a principled threshold of difference that makes one model “significantly” better than another, whatever that means”.
You can use the function
pm.compareplot to present the result of your analysis and help analyze it. Both these function are based on the mentioned chapter. If you use
pm.compare() with the argument
method=BB-pseudo-BMA the standard errors of WAIC are computed using Bayesian Bootstrapping, this is a good idea given that WAIC distribution could be skewed. Also notice that
pWAIC is and estimation of the
effective parameters in your models, probably you should not take that number too seriously but I guess it can give you and idea of the flexibility/complexity of your model, that may help you discuss your results ans answer the reviewer. Notice that hierarchical models are less complex than they seems, this is related to the shrinkage effect.
Thanks @aloctavodia, i’ll give this a go.
Also, thanks for the chapter recommendation. Some good solid foundation in model comparison would help me a lot I think.
I really appreciate all the help from the pyMC_devs team, you guys have been very responsive and helpful