Thanks for your interest! I’m really not at all familiar with gradient boosting trees, could you provide a rough example of your workflow? Take a look at the Cars Dataset notebook and see if that helps clarify its use.
I’m guessing you’ll need to be a bit more careful with your features using a GP compared to a GBT. In Gumbi, you specify which of your variables need to be log- or logit-transformed when you create a gmb.DataSet from a pd.Dataframe. Then, when you go to build/fit the GP, you declare which inputs you’re actually interested in by specifying which are to be treated as continuous variables (continuous_dims) and which as categorical (categorical_dims).
Continuous variables are treated as you might expect, while categorical variables are used to define separate but correlated Gaussian Processes. If you have one categorical variable, each value in that column defines a separate GP; if you have multiple, each unique combination of categorical variables gets its own. This is done through what is known as the “Linear Model of Corregionalization”, or LMC. As this name suggests, the GP learns a linear correlation between the classes, but this can be positive or negative, and it will happily learn there’s no correlation if the data indicate that.
Finally, multi-output regression is also acheived through an LMC implementation.
Hope that helps!