I am working on a problem where I want to compare a Bayesian model to an ML model. I was wondering if anyone has suggestions on which approaches would be fair for both methods. I have compared Bayesian models to one another using az.compare, but never a Bayesian model to ML models.

For example, I want to compare a hierarchical linear model to BART to an xgboost model. Is a classic k-fold CV a good approach or are there other methods for OOS prediction that are preferable. Also which definition of â€śerrorâ€ť is valid for both approaches while also being computationally feasible.

Would you want to compare point estimates from Bayesian models (ie, take the MAP or expected value of the posterior) to point estimate from an ML model? CV would be the way to go. What youâ€™re looking at with az.compare is the result of an approximation of CV thatâ€™s not possible to derive for a generic ML model.

A main strength of Bayesian models is that they do give full posteriors where uncertainty from ML models is usually tacked on or has to be calibrated maybe with an additional Bayesian model, so keep that in mind when comparing point estimates! I think you can use whatever definition of error is most relevant to your context.

So what it comes down to is that our team is slightly split between a traditional ML approach and a Bayesian modeling approach (the direction I lean). The Bayesian approach is a clear winner when it comes to uncertainty, however, we have a lot of data so using MCMC is significantly slowerâ€¦ (I am exploring using VI as an in between but I am not quite there yet).

For example, one of the debates we are having is for some features, we model them as a level within a hierarchical model. However, for the ML approach we can only treat them as an additional predictor.

What youâ€™re looking at with az.compare is the result of an approximation of CV thatâ€™s not possible to derive for a generic ML model.

I have loved using az.compare for comparing Bayesian models of different complexity, centered vs noncentered, etc., but I do understand it is not possible for comparing to ML.

Would you want to compare point estimates from Bayesian models (ie, take the MAP or expected value of the posterior) to point estimate from an ML model? CV would be the way to go.

While the uncertainty is important, in order to justify my claim I think I need some measure of OOS prediction accuracyâ€¦ so I guess to make the comparison fair it would have to be a point estimate comparisonâ€¦ so maybe something like R^2 or MSE and k-fold CV? Is it fair to compare something like a hierarchical Bayes model to an ML model of slightly different structure?

So what it comes down to is that our team is slightly split between a traditional ML approach and a Bayesian modeling approach (the direction I lean)

Been there But why not both?

For example, one of the debates we are having is for some features, we model them as a level within a hierarchical model. However, for the ML approach we can only treat them as an additional predictor.

This is a great point. This situation handled particularly well by a Bayesian model. For an example, say your training data comes from some study done at 5 different hospitals. And using hospital_id as a predictor is very helpful, because they did things a bit differently at each hospital. Maybe the ML model will do really well in CV using data from those 5 hospitals, but in the future how do you predict for a hospital thatâ€™s not one of these five? The only way is to remove that feature, or maybe do something really hacky. In a hierarchical Bayesian model is perfect for handling this very common case.

I guess overall Iâ€™m of the opinion that if you donâ€™t really know the data generation process and you have a lot of data, ML models will often perform better on various metrics (though youâ€™re maybe overfitting a little) because they can adapt to all sorts of non-linear hypothesis spaces without you having to understand much of it. And if you donâ€™t need uncertainty thatâ€™s a great place to use them. But, if you do have a good handle on the data generation process and you can represent that structure in a model, then your Bayesian model will probably win. Also I think Bayesian models can be more useful because of the explainability aspect. In my experience when there are a lot of â€śwhyâ€ť questions that come after forecasts or predictions are made and thatâ€™s where Bayesian methods really shine.

But as far as your actual question about specific metricsâ€¦ not sure! Coverage is another one you could consider, ie, is the true value within the 80% posterior interval 80% of the time?

Totally agree and we definitely use both. Like you said, ML often wins when we have lots of data and donâ€™t worry about some of the points you made about levels/predictors.

This is a great point. This situation handled particularly well by a Bayesian model. For an example, say your training data comes from some study done at 5 different hospitals. And using hospital_id as a predictor is very helpful, because they did things a bit differently at each hospital. Maybe the ML model will do really well in CV using data from those 5 hospitals, but in the future how do you predict for a hospital thatâ€™s not one of these five? The only way is to remove that feature, or maybe do something really hacky. In a hierarchical Bayesian model is perfect for handling this very common case.

Love this. Great explanation that really illustrates the point.

Thanks for all your points. I think this does help clarify when/why one is more suitable. Also the coverage suggestion seems really helpful.