What happens if you use find_MAP
instead of NUTS? Then, use testvals to initialize the hyperparameters the same way as sklearn for a more apples to apples comparison.
I’m not terribly surprised that full MCMC is slower than optimization. Also, the speed of MCMC with GP’s depends a lot on the priors placed on the hyperparameters.