# How to put a restriction on the value of hyper-parameters in Gaussian process

Hi All,

I am learning how to implement Gaussian process regression in PyMC3 recently. I read some examples from others. For the value of hyper-parameters (e.g., the length scale and signal variance in the squared exponential kernel), usually we will first give a prior to the hyper-parameters, then use MAP or other methods to get optimum/posterior values.

To avoid the over-fitting problem, I want to try to put some restriction on the value of some hyper-parameters. For example, I would like to restrict the optimum value of the length scale to be larger than 0.01. Is there any way I can achieve this purpose by adding some code? Thanks.

Many people point to Michael Betancourt’s blog post on exactly this question. You should read through the whole thing, he gives a very good overview of the problem, but the essence of it is he suggests putting 99% of the (Inverse Gamma) prior density on the lengthscale between the shortest and longest pairwise distance in your dataset. My code for doing this looks something like:

``````from scipy.spatial.distance import pdist

distances = pdist(X)

ℓ_l = distances[distances!=0].min()
ℓ_u = distances[distances!=0].max()
ℓ_σ = (ℓ_u-ℓ_l)/6
ℓ_μ = ℓ_l + 3*ℓ_σ

with pm.Model() as model:
ℓ = pm.InverseGamma('ℓ', mu=ℓ_μ, sigma=ℓ_σ)
η = pm.Gamma('η', alpha=2, beta=1)
...
``````

Note that this doesn’t chop the prior off at the lower bound, but it does make it much more likely that your MAP will be above it.

6 Likes

Hi,
To build on @BioGoertz’s answer, you can also read the very good Mauna Loa example (in 2 parts) on PyMC’s website.

Here, the priors are motivated based on domain knowledge, which is a good complement to Betancourt’s example, because sometimes it doesn’t make sense to chop off prior probability for lengthscales shorter / higher than the observed data: it may well be possible that the data you observed until now are not fully representative of the data that can happen (some new data can be on a much shorter / longer lengthscale), and then your model will have troubles.
So here, a solution would be to add new data or, if not possible, add more domain knowledge through the prior.

Hope this helps 1 Like

Thank you very much BioGoertz. I think this blog post answered my question although I can not understand it well right now. Anyway I will go through it in details. Thanks again.

Hi Alex,
your help here is greatly appreciated. Yes the priors including domain knowledge is a good approach. I am also trying to find some knowledge that can apply to my priors but it seems pretty hard since I did not found similar previous studies. Anyway this is a good approach and really thanks a lot.

1 Like