Learning GP prior covariance hyperparameters -- why does StudentT noise model perform better than Normal?

leka0024 · March 4, 2019, 4:28pm

The original pymc3 example notebook is here: https://docs.pymc.io/notebooks/GP-Latent.html

This is my notebook: https://github.com/leka0024/pymc3/blob/master/latent_GPprior_covHyper.ipynb
My main goal is to learn the GP prior covariance hyperparameters. I don’t care about the noise model hyperparameters, unless they help better learn the covariance hyps.

The four cases in my notebook come from using a StudentT (like the pymc3 example notebook) or a Normal for the noise model, and using giving the true values of the noise hyperparameters or putting priors on them and learning them too (though not really my goal).

Observations/questions:

why does the StudentT noise model work better than the Normal?? It learns the cov hyps better, in either case of the noise model hyps. In fact, when using the Normal with noise hyps, there is divergence periods in the main trace.
I know there is also a .Marginal GP method, that maybe would be better than the way I’ve done it with Normal … but I don’t see why I shouldn’t still be able to do this way (the “manual way” perhaps) and have it work just as well?
Does it make sense in this scenario that we can use just StudentT or Normal, instead of MvStudentT or MvNormal ? Because my understanding is that GP is essentially a MvNormal, where dimension is the same as the length/size of the support (might be wrong word).

Appreciate any insights, especially mathematically on #3 ! Thank you

michaelosthege · March 9, 2020, 2:58pm

Unfortunately @leka0024 never got a reply here, but I have more or less that same questions.

Maybe someone can help answering.

My best guesses would be:

StudentT likelihoods have “nicer” gradients when the prediction is far away from the maximum. With x \to \inf, the logpdf_T(x) converges to a constant a line, wheras logpdf_{Normal}(x) keeps curving downwards. I could imagine this being a reason for divergences?

cheers

Topic		Replies	Views
Marginal student t process and misleading parameters Development	4	1052	February 21, 2018
Overcoming divergence in MvNormal covariance model? Questions	1	478	January 28, 2019
Question about PyMC3 inference in hierarchial model Questions	7	786	April 23, 2018
Gaussian Process Regression level 1 inference: Re-producing Mauna Loa CO2 Example with PyMC3 Questions	10	2217	August 16, 2017
Multiple (uncertain) function observations of the same Gaussian process Questions	15	6538	October 16, 2017

Learning GP prior covariance hyperparameters -- why does StudentT noise model perform better than Normal?

Related topics