Gaussian Process Regression level 1 inference: Re-producing Mauna Loa CO2 Example with PyMC3

Hi @max, I have a look at the page you are referring to on R+W, I think I more or less get where your confusion is - I will try to clarify below, let me know if I misunderstand you at any way:

This is not correct, the MAP estimation is the parameter \boldsymbol{w}. As in the example on page 108: “For example, the parameters (w) could be the parameters in a linear model […]. At the second level are hyperparameters θ which control the distribution of the parameters at the bottom level. For example […] the “ridge” term in ridge regression are hyperparameters.” Taking a simple regression example, Y = X*b + eps. b would be the parameters in this case, and the prior of b would be the hyperparameter θ. If you assume flat prior, you have MAP estimation = MLE = Least-square solution.

So now get to GP, \boldsymbol{w} here is non-parametric/a function. As explained in the book later (which you also quote), “one can regard the noise-free latent function values at the training inputs \boldsymbol{f} as the parameters”. So, if you want a prior for value, you need a distribution of value (takes the value as input and output probability); similarly, if you want a prior for a function, you need a distribution of function. It is hard to wrap the head around at first, you can instead think of a distribution take a lot of values at once (infinitely many, in fact) and output probability. And when you sample from a distribution of function, you get functions, which effectively what you are seeing in the doc: Samples from GP Prior.

Of course, writing a prior for a function is not straightforward (after all there are potentially infinite many parameters, or as in R+W “The more training cases there are, the more parameters.”), so instead we put prior on the hyperparameters (the parameters of the covariance function) θ, the parameters that generate the functions.

I hope this point is now more clear: inference on GP is automatically level 2, as we can not put prior on w but need to put prior on hyperparameter theta. This also brings to the pervious answer that even when we just find the MAP or MLE regarding the covariance function parameters (i.e., theta), there is a prior there (i.e., flat prior).

You can do inference via MAP estimation in PyMC3 (using find_MAP), the problem is just that in higher dimension the result will not be accurately reflecting the Posterior.

2 Likes