Hey there, I have a question for a GP expert: I recently read about the more robust Student T process as an alternative to the ordinary Gaussian process. I read that Student T processes are not additive, meaning, according to my understanding, the following is incorrect:
In my case, I want to model over-dispersive count data, which is why I chose a Negative Binomial distribution as the likelihood. Since the degree of dispersion in my data can vary, I would like to model both distribution parameters (\mu and \alpha) as two (independent) GPs.
Question: Is it theoretically possible to model both parameters \mu and \alpha with a more robust Student T process, or does this clash with the concept of additivity? According to my understanding, it should not be a problem, but I would like to be sure.
I was inspired by Goldberg et al. 1998 which modeled mu and sigma of a Gaussian likelihood using two independent GPs. The pymc example gallery also features such a case for modelling heteroskedastic GPs. So I naively assumed that this approach would work for NB likelihoods as well?
Agree with @ricardoV94, but, if you have a long lengthscale the GP isn’t that flexible. The posterior of that parameter should help you see if it’s making sense.
I’m not sure if I’m following your point about additivity. You can certainly add two TP’s together, but maybe their covariances don’t add (would have to check your refs). Also, using two independent GPs/TPs for mu and alpha aren’t adding, so nothing to worry about there.
Thanks to both of you @bwengals and @ricardoV94 !
Indeed looking at the posteriors it seems that the model isn’t picking up any spatial structure in the alpha parameter. I guess that indicates that using a Student T distribution for modelling alpha is sufficient without applying a fully fledged GP/TP? It’s a bit curious, however, since I can identify spatial structures when plotting the alpha computed for my (log) observed data…
Figure: Negative Binomial prior vs posterior:
Both parameters mu and alpha are represented here as a TP where the covariance function is given as the sum of two Matern32 kernels, with a short and long length scale, respectively. (Note: the four colors represent four models describing different scenarios that I want to compare; the results are pretty similar though for all models)
left column → mu related distributions right column → alpha (here denoted as phi) related distributions
first row → respective mean functions 2nd & 3rd row → variance (eta) of kernel part 1 and 2 4th & 5th row → length scale (rho) of kernel part 1 and 2
You might wan’t to be a bit careful with your priors for the alpha parameters. The way it’s parameterized in PyMC, when alpha → infinity, the negative-binomial likelihood goes to Poisson. So, putting a prior like
log_alpha ~ TP(0, K) # or student-t without spatial dependency
alpha = exp(log_alpha)
will cause your prior to push you towards more overdispersion, which probably isn’t what you want. To fix this, you could try putting the prior on the inverse, so it shrinks the other direction. That might help, because alpha is probably hard to identify unless you have a lot of data, and that problem will be exacerbated when you include spatial dependence.