Gaussian Processes: Sampling kernel hyperparameters?

From the official documentation on PyMC3’s Gaussian Process implementation(s), and more specifically, the Marginal class:

# A one dimensional column vector of inputs.
X = np.linspace(0, 1, 10)[:, None]

with pm.Model() as model:
    # Specify the covariance function.
    cov_func = pm.gp.cov.ExpQuad(1, ls=0.1)

    # Specify the GP.  The default mean function is `Zero`.
    gp = pm.gp.Marginal(cov_func=cov_func)

    # Place a GP prior over the function f.
    sigma = pm.HalfCauchy("sigma", beta=3)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

...

# After fitting or sampling, specify the distribution
# at new points with .conditional
Xnew = np.linspace(-1, 2, 50)[:, None]

with model:
    fcond = gp.conditional("fcond", Xnew=Xnew)

I see that a prior is placed over sigma, the error term, and that’s it. The kernel basically decides how squiggly/smooth your function should be, yet priors have not been placed on the kernel terms sigma_f and ls. So now I’m wondering, is explicitly modeling the kernel hyperparameters an atypical design choice? (Perhaps this was not included for the sake of tutorial brevity.)

Out of curiosity, how were 1 and 0.1 selected as Kernel hyperparameters? I’m uncertain on the ranges that they should take on. (Naively, I might assume both sigma_f and ls should fall within range [0,1] in the RBF/ExpQuad kernel.)

I ask these questions as it’s my understanding that both the Prior and Posterior distributions in a Gaussian Process are distributions over functions. And so it’s my (naive) belief that pm.gp.cov.ExpQuad(1, ls=0.1) is just one such function, not a distribution of functions.

Could anyone clarify?

And… crickets!

Hi there,

I wasn’t involved in designing the example, but I have worked with GPs quite a lot, so I may have some comments for you regarding your questions:

  • Fixing the parameters for the covariance function is unusual, in my opinion. I expect that this was just a choice made to keep the documentation succinct, as you suggest. I can’t think of a compelling reason for the two values chosen apart from them being convenient example values. Unless you have a very good reason for fixing them, you would probably want to do inference over them too. There seem to be longer tutorials that you might like which cover this: Gaussian Process Regression — PyMC3 3.1rc3 documentation
  • Regarding the question of whether fixing the hyperparameters means that the GP is no longer a distribution over functions, I believe the answer to that is no: this is still a distribution over functions. It’s just that it’s a distribution that places more prior mass on functions that have a similar lengthscale and variance to the one specified. You can see that there is still a lot of variation in the functions that are consistent with this prior if you sample from it:
import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

X = np.linspace(0, 1, 100)[:, None]

with pm.Model() as model:
    # Specify the covariance function.
    cov_func = pm.gp.cov.ExpQuad(1, ls=0.1)

    # Specify the GP.  The default mean function is `Zero`.
    gp = pm.gp.Latent(cov_func=cov_func)

    # Place a GP prior over the function f.
    f = gp.prior("f", X=X)

    prior_checks = pm.sample_prior_predictive(samples=50, random_seed=3)

plt.plot(X,  prior_checks['f'].T)
plt.title('Draws from GP')
plt.gcf().tight_layout()

The last line should produce something like this:

So you can see that even this prior is still consistent with a whole range of possible functions. Hope this answers your questions, let me know!

3 Likes