I’m a bit unsure what the likelihood for the latent implementation of a gaussian process is. For example, if I construct a model that contains a GP component and then sample from a normal distribution using pm.Normal, what likelihood is being sampled?
Say my model is: f = A+B*GP, where A and B are sampled parameters and GP is a gaussian process prior. If the lengthscale of the GP is also a sampled parameter then is it optimised by the likelihood here? i.e does the likelihood contain the term penalising small lengthscales?
I’m a bit new to gaussian processes so any help is appreciated.
Sure! So for the likelihood vs prior question, the numerator of Bayes theorem has both the likelihood and the prior(s) next to each other. Which distribution is part of the prior or part of the likelihood depends on where the observed data is. The likelihood is p(D | \theta), and the prior is p(\theta). In a PyMC model, this is indicated by where observed is set.
For example, if I construct a model that contains a GP component and then sample from a normal distribution using pm.Normal, what likelihood is being sampled?
The Normal distribution. If your GP is called, say, g, and your observed data is y, you could write the likelihood as N(y | g\,, \sigma), and the Gaussian process prior as MVN(g | 0, K).
If the lengthscale of the GP is also a sampled parameter then is it optimised by the likelihood here? i.e does the likelihood contain the term penalising small lengthscales?
No, the likelihood doesn’t, but the prior does. The term responsible for penalizing small lengthscales is in the GP prior, which is really just a multivariate normal, is the determinant.