You can code it into the GP as a mean function, actually in gp.Latent it is default to reparameterized so it is just like GP(0, k(x, x)) + Xb (https://github.com/pymc-devs/pymc3/blob/c760a85299693ea81c39081077f982916a5055a9/pymc3/gp/gp.py#L109), but since you are going to use sparse approximation as well, it is easier probably to pop the linear prediction out and add it specifically after the GP is initialized.