May I ask how you implemented the GP in PyMC? I think the best way to go in this case could be to use the marginal likelihood implementation. That ought to make PyMC’s sampling a little bit easier by integrating out the GP and reducing the number of parameters that have to be sampled. To recreate what GPyTorch is doing, you should be able to use the MAP estimate. Out of interest: how many data points you are training on, and also how fast should the model be?