Comparing models and selecting it with "weight" of arviz.compare

Interpretation of standard errors is still a bit of an open question, see for example [2008.10296] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison.

The main reference for that is probably the paper linked in the docstring (which we should fix and format as proper references): [1704.02030] Using stacking to average Bayesian predictive distributions