GP regularization effect

It looks like you’re trying to do MAP estimation here? Even with regularisation, that typically doesn’t work, unless you marginalise out the GP, because the posterior mode may well be at these pathological points. A similar thing can happen in hierarchical models where setting the prior variance of the random effects to zero gives a spike in the posterior distribution (there’s a bit of a discussion of that here, under “Why hierarchical models are Bayesian”, for example: Why hierarchical models are awesome, tricky, and Bayesian — While My MCMC Gently Samples). I’ll try to dig up more references about this, but in short I’d suggest using the NUTS sampler or maybe trying variational inference.

1 Like