Using Student T processes and additivity

nrieger · February 13, 2024, 10:25am

Hey there, I have a question for a GP expert: I recently read about the more robust Student T process as an alternative to the ordinary Gaussian process. I read that Student T processes are not additive, meaning, according to my understanding, the following is incorrect:

tp1 = pm.gp.TP(mean_func1, cov_func1)
tp2 = pm.gp.TP(mean_func2, cov_func2)
tp3 = tp1 + tp2  # <-- wrong!

In my case, I want to model over-dispersive count data, which is why I chose a Negative Binomial distribution as the likelihood. Since the degree of dispersion in my data can vary, I would like to model both distribution parameters (\mu and \alpha) as two (independent) GPs.

Question: Is it theoretically possible to model both parameters \mu and \alpha with a more robust Student T process, or does this clash with the concept of additivity? According to my understanding, it should not be a problem, but I would like to be sure.

Thank you!

ricardoV94 · February 13, 2024, 8:51pm

Doesn’t sound like a problem but I wonder how identifiable the alpha parameter is with a flexible prior such as a GP (or TP)

nrieger · February 13, 2024, 9:34pm

Thanks @ricardoV94 for the feedback!

I was inspired by Goldberg et al. 1998 which modeled mu and sigma of a Gaussian likelihood using two independent GPs. The pymc example gallery also features such a case for modelling heteroskedastic GPs. So I naively assumed that this approach would work for NB likelihoods as well?

ricardoV94 · February 14, 2024, 7:43am

I have the feeling the NB alpha is a bit more fuzzy than the Normal standard deviation, but I’m curious to see if it works out!

bwengals · February 14, 2024, 11:30pm

Agree with @ricardoV94, but, if you have a long lengthscale the GP isn’t that flexible. The posterior of that parameter should help you see if it’s making sense.

I’m not sure if I’m following your point about additivity. You can certainly add two TP’s together, but maybe their covariances don’t add (would have to check your refs). Also, using two independent GPs/TPs for mu and alpha aren’t adding, so nothing to worry about there.

nrieger · February 15, 2024, 11:11am

Thanks to both of you @bwengals and @ricardoV94 !
Indeed looking at the posteriors it seems that the model isn’t picking up any spatial structure in the alpha parameter. I guess that indicates that using a Student T distribution for modelling alpha is sufficient without applying a fully fledged GP/TP? It’s a bit curious, however, since I can identify spatial structures when plotting the alpha computed for my (log) observed data…

Figure: Negative Binomial prior vs posterior:
Both parameters mu and alpha are represented here as a TP where the covariance function is given as the sum of two Matern32 kernels, with a short and long length scale, respectively. (Note: the four colors represent four models describing different scenarios that I want to compare; the results are pretty similar though for all models)

left column → mu related distributions
right column → alpha (here denoted as phi) related distributions

first row → respective mean functions
2nd & 3rd row → variance (eta) of kernel part 1 and 2
4th & 5th row → length scale (rho) of kernel part 1 and 2

bwengals · February 22, 2024, 9:46pm

You might wan’t to be a bit careful with your priors for the alpha parameters. The way it’s parameterized in PyMC, when alpha → infinity, the negative-binomial likelihood goes to Poisson. So, putting a prior like

log_alpha ~ TP(0, K)  # or student-t without spatial dependency
alpha = exp(log_alpha)

will cause your prior to push you towards more overdispersion, which probably isn’t what you want. To fix this, you could try putting the prior on the inverse, so it shrinks the other direction. That might help, because alpha is probably hard to identify unless you have a lot of data, and that problem will be exacerbated when you include spatial dependence.

nrieger · February 23, 2024, 10:41am

that’s interesting, thanks a lot @bwengals ! I will play around with the prior a bit to see how sensitive the results are to changing this parameter.

Topic		Replies	Views
Marginal student t process and misleading parameters Development	4	1047	February 21, 2018
Mean of T process Questions	7	872	October 4, 2017
What parameter do I manipulate to spread the tails in a student T distribution? version agnostic modeling	5	429	June 30, 2023
Specifying nu for StudentT Questions	4	493	January 15, 2019
Combining two distributions Questions	11	1978	February 1, 2021

Using Student T processes and additivity

Related topics