Gaussian Process Regression: PyMC3 much slower than Scikit-learn

This is a question about Gaussian Process (GP) regression on a pedagogical function, f(x, y, z) = \exp(-(x^2 + yz - z)).

While training scikit-learn GaussianProcessRegressor takes less than half a second, training the GP with PyMC3 takes more than 2 minutes. Hence I would like to know:

  1. Are there ways for me to speed up PyMC3, except for changing ncores in pm.sample() or reducing the number of sampling N = 200?
  2. The causes for the significant time difference?

I’m mostly concern with emulating the function with great accuracy, so that I can use the emulated function to replace some computationally expensive function, and do things like MCMC.

Why don’t I just use scikit-learn? Because I have a hard time using results from scikit-learn inside other PyMC3 models to do Bayesian inference, whereas GP emulator trained with PyMC3 can, of course, easily be fed to the next stage in the analysis pipeline.


import warnings
warnings.simplefilter('ignore', UserWarning)
warnings.simplefilter('ignore', FutureWarning)
import numpy as np
import pandas as pd

truth_model = lambda x, y, z: np.exp(-(x**2 + y * z - z))

"""prepare training data"""
N = 200
import skopt # for doing Latin hypercube sampling (LHS)
space = skopt.Space([[-1.0, 1.0] for _ in range(3)])
sampler = skopt.sampler.Lhs(criterion='maximin', iterations=int(1e3))
df_train = np.array(sampler.generate(space.dimensions, N))
df_train = pd.DataFrame(df_train, columns=['x', 'y', 'z'])
df_train['f'] = truth_model(df_train['x'], df_train['y'], df_train['z'])
df_train['f'] += np.random.normal(scale=1e-3, size=len(df_train)) # add random noise

"""Scikit-learn Gaussian Process Regressor training"""
"""This step takes about 200 millisecond"""
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, DotProduct, WhiteKernel, RBF
kernel = ConstantKernel(constant_value=1.0, constant_value_bounds=(0.0, 10.0))\
    * RBF(length_scale=0.5, length_scale_bounds=(0.0, 10.0))\
    + RBF(length_scale=2.0, length_scale_bounds=(0.0, 10.0))
sk_gpr = GaussianProcessRegressor(kernel=kernel, random_state=0)
sk_gpr =[['x', 'y', 'z']], df_train['f'])

"""PyMC3 GP training"""
"""This step takes about 2 min"""
import pymc3 as pm
with pm.Model() as model:
    ls = pm.Gamma('ls', 2, 1, shape=3)
    cov =, ls=ls)
    gp =
    eps = pm.HalfNormal('eps', 10)
    y_pred = gp.marginal_likelihood('y_pred',\
                                    X=df_train[['x', 'y', 'z']].to_numpy(),\
    trace = pm.sample(2000, tune=1000, cores=1)
point = {'ls': np.mean(trace['ls'], axis=0), 'eps': trace['eps'].mean()}
def pymc3_model(x, y, z):
    X = np.array([x, y, z]).T
    return gp.predict(X, point=point, diag=True)[0]

Just to make sure both emulators perform with good accuracy, I have also made some simple test by eye. Basically, they give very similar results.

X_test = pd.DataFrame(np.random.uniform(-1.0, 1.0, size=(10, 3)), columns=['x', 'y', 'z'])
df_verify = pd.DataFrame({\
    'truth': truth_model(X_test['x'], X_test['y'], X_test['z']),\
    'sklearn': sk_gpr.predict(X_test),\
    'pymc3': pymc3_model(X_test['x'], X_test['y'], X_test['z']),\
df_verify['sklearn_dev'] = df_verify['sklearn'] - df_verify['truth']
df_verify['pymc3_dev'] = df_verify['pymc3'] - df_verify['truth']

What happens if you use find_MAP instead of NUTS? Then, use testvals to initialize the hyperparameters the same way as sklearn for a more apples to apples comparison.

I’m not terribly surprised that full MCMC is slower than optimization. Also, the speed of MCMC with GP’s depends a lot on the priors placed on the hyperparameters.