How to put a restriction on the value of hyper-parameters in Gaussian process

Hi All,

I am learning how to implement Gaussian process regression in PyMC3 recently. I read some examples from others. For the value of hyper-parameters (e.g., the length scale and signal variance in the squared exponential kernel), usually we will first give a prior to the hyper-parameters, then use MAP or other methods to get optimum/posterior values.

To avoid the over-fitting problem, I want to try to put some restriction on the value of some hyper-parameters. For example, I would like to restrict the optimum value of the length scale to be larger than 0.01. Is there any way I can achieve this purpose by adding some code? Thanks.

Hi Jiadaren!

Many people point to Michael Betancourt’s blog post on exactly this question. You should read through the whole thing, he gives a very good overview of the problem, but the essence of it is he suggests putting 99% of the (Inverse Gamma) prior density on the lengthscale between the shortest and longest pairwise distance in your dataset. My code for doing this looks something like:

from scipy.spatial.distance import pdist

distances = pdist(X)

ℓ_l = distances[distances!=0].min()
ℓ_u = distances[distances!=0].max()
ℓ_σ = (ℓ_u-ℓ_l)/6
ℓ_μ = ℓ_l + 3*ℓ_σ

with pm.Model() as model:
    ℓ = pm.InverseGamma('ℓ', mu=ℓ_μ, sigma=ℓ_σ)
    η = pm.Gamma('η', alpha=2, beta=1)

Note that this doesn’t chop the prior off at the lower bound, but it does make it much more likely that your MAP will be above it.


To build on @BioGoertz’s answer, you can also read the very good Mauna Loa example (in 2 parts) on PyMC’s website.

Here, the priors are motivated based on domain knowledge, which is a good complement to Betancourt’s example, because sometimes it doesn’t make sense to chop off prior probability for lengthscales shorter / higher than the observed data: it may well be possible that the data you observed until now are not fully representative of the data that can happen (some new data can be on a much shorter / longer lengthscale), and then your model will have troubles.
So here, a solution would be to add new data or, if not possible, add more domain knowledge through the prior.

Hope this helps :vulcan_salute:

1 Like

Thank you very much BioGoertz. I think this blog post answered my question although I can not understand it well right now. Anyway I will go through it in details. Thanks again.

Hi Alex,
your help here is greatly appreciated. Yes the priors including domain knowledge is a good approach. I am also trying to find some knowledge that can apply to my priors but it seems pretty hard since I did not found similar previous studies. Anyway this is a good approach and really thanks a lot.

1 Like