Learning GP prior covariance hyperparameters -- why does StudentT noise model perform better than Normal?

Unfortunately @leka0024 never got a reply here, but I have more or less that same questions.

Maybe someone can help answering.

My best guesses would be:

  1. StudentT likelihoods have “nicer” gradients when the prediction is far away from the maximum. With x \to \inf, the logpdf_T(x) converges to a constant a line, wheras logpdf_{Normal}(x) keeps curving downwards. I could imagine this being a reason for divergences?

cheers

2 Likes