Thanks I really appreciate you replying.
It’s good to know it’s not automatically doing the latter optimization thing by default. But to be clear about what I was asking in the first part:
The marginal likelihood of the GP model is (1) where f are the mean values of the GP which we integrate out, y are the independent variable we’re modelling, and X are the predictor values, \theta are the kernel hyper-parameters (l and \eta in your example):
p(y\left|X, \theta)= \int p(y | f, X, \theta) p(f | X, \theta) d f\right.\ \ \ \ (1)
This is an Multivariate Normal. This means the posterior for the hyper-parameters \theta is:
p(\theta | y, X) \propto p(y | X, \theta)p(\theta) \ \ \ \ (2)
So my question is does MAP maximize (or minimize the negative log ) of (2)?
By comparing what you wrote with what I’ve written I think yes.