Comparing models and selecting it with "weight" of arviz.compare

marcodena · June 10, 2021, 1:32pm

Hi everybody,

I have built few models using pymc3 and I am comparing them to find the best one using arviz.compare.
Depending on the Information Criteria I use (either ‘WAIC’ or ‘PSIS-LOO’) I get the same rankings, but the IC standard errors differ a lot.

In particular, if I use WAIC, the results highlight one best model: model WAICs are far away from one another when considering the IC standard error. However, if I use PSIS-LOO, although the model ranking remains the same (ranking is based on IC results) the standard error of the ICs seems to imply that the ranking is not significant and thus the model selection is not reliable.

Arviz.compare also provide a metric called “weight” which in the documentation is referred to as “can be loosely interpreted as the probability of each model given the data” (see arviz.compare — ArviZ dev documentation).

I have two questions about the weight metric:

I have been looking at the code but I am not sure I understood what it is exactly doing, can anyone provide any reference to this specific operation?
Is it correct to use this measure to decide whether a model is “sufficiently” better than another one, e.g. by imposing a minimum value of the best model weight weight>=0.99)?

thanks a lot for your help

OriolAbril · June 10, 2021, 4:09pm

Interpretation of standard errors is still a bit of an open question, see for example [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison.

The main reference for that is probably the paper linked in the docstring (which we should fix and format as proper references): [1704.02030] Using stacking to average Bayesian predictive distributions

Topic		Replies	Views
Arviz compare: rank inconsistent with weight v5 arviz	2	586	June 24, 2022
Comparing models - ranks vs weights	2	40	July 30, 2024
Problem in model comparison	3	221	February 8, 2024
Bayesian model averaging: ranking of model weights and LOO don't match Questions	10	1329	March 14, 2020
Is model with high or low WAIC and LOO better version agnostic arviz	1	1605	August 12, 2021

Comparing models and selecting it with "weight" of arviz.compare

Related topics