How to put a restriction on the value of hyper-parameters in Gaussian process

Jiadaren · October 23, 2020, 12:21am

Hi All,

I am learning how to implement Gaussian process regression in PyMC3 recently. I read some examples from others. For the value of hyper-parameters (e.g., the length scale and signal variance in the squared exponential kernel), usually we will first give a prior to the hyper-parameters, then use MAP or other methods to get optimum/posterior values.

To avoid the over-fitting problem, I want to try to put some restriction on the value of some hyper-parameters. For example, I would like to restrict the optimum value of the length scale to be larger than 0.01. Is there any way I can achieve this purpose by adding some code? Thanks.

BioGoertz · October 23, 2020, 9:12am

Hi Jiadaren!

Many people point to Michael Betancourt’s blog post on exactly this question. You should read through the whole thing, he gives a very good overview of the problem, but the essence of it is he suggests putting 99% of the (Inverse Gamma) prior density on the lengthscale between the shortest and longest pairwise distance in your dataset. My code for doing this looks something like:

from scipy.spatial.distance import pdist

distances = pdist(X)

ℓ_l = distances[distances!=0].min()
ℓ_u = distances[distances!=0].max()
ℓ_σ = (ℓ_u-ℓ_l)/6
ℓ_μ = ℓ_l + 3*ℓ_σ

with pm.Model() as model:
    ℓ = pm.InverseGamma('ℓ', mu=ℓ_μ, sigma=ℓ_σ)
    η = pm.Gamma('η', alpha=2, beta=1)
    ...

Note that this doesn’t chop the prior off at the lower bound, but it does make it much more likely that your MAP will be above it.

AlexAndorra · October 26, 2020, 9:25am

Hi,
To build on @BioGoertz’s answer, you can also read the very good Mauna Loa example (in 2 parts) on PyMC’s website.

Here, the priors are motivated based on domain knowledge, which is a good complement to Betancourt’s example, because sometimes it doesn’t make sense to chop off prior probability for lengthscales shorter / higher than the observed data: it may well be possible that the data you observed until now are not fully representative of the data that can happen (some new data can be on a much shorter / longer lengthscale), and then your model will have troubles.
So here, a solution would be to add new data or, if not possible, add more domain knowledge through the prior.

Hope this helps

Jiadaren · October 30, 2020, 7:43am

Thank you very much BioGoertz. I think this blog post answered my question although I can not understand it well right now. Anyway I will go through it in details. Thanks again.

Jiadaren · October 30, 2020, 7:50am

Hi Alex,
your help here is greatly appreciated. Yes the priors including domain knowledge is a good approach. I am also trying to find some knowledge that can apply to my priors but it seems pretty hard since I did not found similar previous studies. Anyway this is a good approach and really thanks a lot.

Topic		Replies	Views
Gaussian Processes: Sampling kernel hyperparameters? Questions	2	862	January 15, 2021
Custom prior for gaussian process mean v5 modeling	1	198	October 25, 2023
Understanding over-fitting with Gaussian Process Regression in pymc3 Questions	0	1595	September 9, 2019
Gaussian Process and magnitude of Y Questions	3	866	December 9, 2019
Tuning the hyperparameters v3 gaussian_process	0	628	April 19, 2022

How to put a restriction on the value of hyper-parameters in Gaussian process

Related topics