Kernel hyper-parameters priors and their use in MAP estimates of Gaussian Processes

Note that find_MAP maximized re the posterior distribution, but not the posterior marginal distribution - the two are different. There are a bit more discussion on this in this threat: Gaussian Process Regression level 1 inference: Re-producing Mauna Loa CO2 Example with PyMC3