Gaussian Process regression with categorical features

I have a case, where I’m trying to train a GP process regression model based continuous and categorical inputs. I read that traditional covariance functions that are used with Gaussian Process regression may not work well with categorical inputs.

I tried to understand the the covariance functions that we have with PyMC3, but cannot find whether we have any covariance function that works well with categorical inputs. I appreciate any help to identify a suitable covariance function to use with categorical inputs.

Also is there a way to combine two kernels, yet use different features with each kernel.

As an example :

X1, X2 = X[features1], X[features2]
cov = nu ** 2 *[1], l1) *[1], l2)
gp =

Notice that each kernel has different set of features. Can we do this? Then how do we pass data to each kernel?

Yep, different kernels can use different features. Use the input_dim and active_dims parameters of each kernel. This convention was shamelessly stolen from GPy and GPflow because it’s a good idea. input_dim will be equal to the number of columns of X, and active_dims is used to pick out which columns an individual kernel is applied to. So for your case, you’d write:

cov = nu**2 *[1], active_dims=features1) *[1], active_dims=features2)  

Regarding your first question, there aren’t any built in kernels specifically for dealing with categorical inputs. One option is one-hot encoding your categories and using a standard ExpQuad. You can also define a custom kernel pretty easily (see here). Or you could use the Coregionalization kernel, possibly without modification.