Joint Optimization of GP

In pymc3,

when we create two GPs with squared exp kernel inside a model with shared hyper parameters (lengthscale and output variance), are they jointly optimised? If yes, how is it done under the hood?

y1 ~ N(gp1,sigma1)
y2 ~ N(gp2,sigma2)

p(lengthscale, output variance | y1) ~ p(y1 | lengthscale, output variance)p(lengthscale, output variance)
p(lengthscale, output variance | y2) ~ p(y2 | lengthscale, output variance)p(lengthscale, output variance)

Does it call the nuts sampler for both y1 and y2? How can we do joint optimisation? I would like to know what’s happening in the backend?

Best Regards,
Rohan

They are sampled jointly yes - once you have the full model written down, the likelihood function is fully constructed - the question/problem is how you can use it. Optimization is one way (finding some interesting point in the geometry in this high-dimensional space, like local maximum etc), sampling is another way (so that you can do E(f(x)) by doing E(f(x_i))). You can learn more about likelihood function here: https://github.com/junpenglao/All-that-likelihood-with-PyMC3 which would naturally answer questions like “How can we do joint optimisation for both y1 and y2?”.

As for the technical details about NUTS sampler, best to start with https://arxiv.org/abs/1701.02434

2 Likes