Looking at BART, this seems like a pretty weird way to approximate a Gaussian process, assuming it converges to the same result in a limit. My question is why you’d use BART instead of a GP directly. Is BART generally faster, or is there some other advantage to it? If it’s speed, how does BART compare to other methods for quickly approximating a Bayesian regression?

Hi @ParadaCarleton thanks for your question.

Many models can be seen as particular instances or approximations to other models. For example neural networks, splines and even liner model csn be seen as GPs under the proper conditions. This property of one model been an approximation to another one can be very useful to theoretically gain a better understanding of the statistical properties of those models. But I wil not say that BART is trying to approximate a GP, as that’s not the goal of BART. Also there have been some proposals combining BART and GPs, where for example the values of the leaf nodes are sampled from a GP.

As you mentioned in your question other features come into play when working in practice. In the literature, BART is generally used as a method that requieres little user input and provides very good fit/predictions. So it seems like a good model/mehod for a low investment.

Other models like GPs generally requiere more user involvement, and sometimes that investment is needed to get better predictions or better understanding. Generally speking adding structure to a model should help to get a better model. BART by itself is not able to provide much structure, we may add ways to add more structure in the future.

Benchmark van be tricky and I don’t have a one that I have run with our implementation (maybe we should) and I don’t remember seing a benchmark from other implementation (but hey are probably out there), so I can not provide any accurate estimates of the speed diferences between BART and other models. Also speed will depends on the implementations, for example there is an XBART package in R, that has been designed for speed.

For our implementation we know we can improve speed and memory usage, and we have some ideas on how to do it. But so far the main focus has been on making a flexible BART implementation that can be used as a building block of user defined models. This contrast with all or most other implementations that focus on conjugate priors so a model/implementation for a poisson likelihood is different from the one wih a gaussian likelihood.

I see, that’s helpful. The reason I’m wondering when I should use each one is because they seem to be doing fundamentally the same task (nonparametric estimation of a function). When will BART outperform a Gaussian process, if ever? To me it feels like BART is trying to do the same thing as a GP, but with a much less natural connection to the underlying data-generating process. But as you mentioned, there might be practical reasons to favor BART, like performance or simplicity.

My understanding (which is far shallower than @aloctavodia ) is that BART typically requires very little in the way of tweaking, which may be seen as one advantage over GPs (depending on the user and the scenario). Relatedly, I don’t think BART makes any assumptions about smoothness (which could be good or bad, depending). And BART (like other tree-based methods) may handle multinomial outcomes more naturally. But that’s just a bit of speculation based on generalizing from other, related techniques.