Behaviour of Gibbs kernel seems off

Hmm, I think I understand. I think this might be tricky to do out of the box here, since warp_gp_func as it is now has to be a function. I know it’s a bit of a bummer, but if I were you, I’d try writing a new covariance class using Gibbs as a starting point. I’d get rid of warp_func as an input, and try and construct the lengthscale GP within your GibbsGP class. Whether you are training or extrapolating is basically determined in the Covariance by whether Xs is set. Maybe that will give you enough flexibility?