Mean of T process

Hi!
I use gaussian processes to model a timeseries like this (see image row 2).
Here blue dots are measurements and GP fit is shown with orange line and 68 and 95% regions.
I use an unknown constant mean function c and as you can see in this traceplot of 3 chains (see image row 1) - the constant value is very reasonable - about 0.5-0.6.

Then I fit a T process to compare results. The predictions mean is similar to that of GP, and uncertainties seem even better (see image row 4).

However, at the same time the mean value seems off (see image row 3).
Here its posterior is very spread and closer to 1. The priors are the same in both cases (c ~ HalfNormal(0.5)).

Is this something specific to T processes, or their implementation in pymc?

Here are all plots in a single image as I can’t upload them separately here:

Did you try to use a zero mean function? Most of the case it doesn’t make a big difference where you have data, the difference would be where you are trying to make a prediction where it is far away from your observation (e.g., in your case if you try to predict value at x=100 it would be a Gaussian with mean=c). But I agree that the mean value seems off. Could you please post your code and data?

Yes, I tried zero mean. But in some cases I fit linear mean function and need its parameters - when using T processes they are entirely off. Will post code and data, but the behavior is similar for basically any data.

That is an odd result, since the T process is implemented similarly to gp.Latent. You could also try a GP with a Student T likelihood. I think it’s good to use a constant in a case like this.

What happens if you divide your data by the mean before fitting the T process? Does the constant parameter still get dragged too high?

I forgot to mention, but I used GP with student likelihood, same as TP.

And here is the whole code:

with pm.Model() as model:
    c = pm.Normal('c', mu=0, sd=0.5)
    η = pm.HalfNormal("η", sd=1)
    ℓ = pm.HalfNormal("ℓ", sd=0.5)
    ν = pm.Gamma("ν", alpha=2, beta=0.1, shape=2)

    cov = η**2 * pm.gp.cov.ExpQuad(1, ℓ)
    gp = pm.gp.Latent(mean_func=pm.gp.mean.Constant(c),
                      cov_func=cov)
    # comment the above line and uncomment below to switch from GP to TP
#     gp = pm.gp.TP(mean_func=pm.gp.mean.Constant(c),
#                   cov_func=cov,
#                   nu=ν[0])
    f = gp.prior('f', X=x_train[:, None])

    σ = pm.HalfNormal("σ", sd=0.5)
    y_ = pm.StudentT("y", mu=f, lam=1.0/σ, nu=ν[1], observed=y_train)

    trace = pm.sample(njobs=3)

    x_pred = np.linspace(x_train.min() - 1, x_train.max() + 1, 100)
    y_pred = gp.conditional('y_pred', Xnew=x_pred[:, None])
    samples = pm.sample_ppc(trace, vars=[y_pred], samples=1000)

and then

plt.plot(x_train, y_train, '.')
l, = plt.plot(x_pred, np.median(samples['y_pred'], 0))
plt.fill_between(x_pred, *np.percentile(samples['y_pred'], [16, 84], 0), alpha=0.4, color=l.get_color())
plt.fill_between(x_pred, *np.percentile(samples['y_pred'], [2.5, 97.5], 0), alpha=0.2, color=l.get_color())

to get the prediction plots from the first post. The data:

x_train = [ -2.91991786,  -2.75564682,  -2.65434634,  -2.62149213,
        -2.44079398,  -2.04106776,  -1.89322382,  -1.71526352,
        -1.52361396,  -1.39493498,  -1.25256674,  -1.03080082,
        -0.71594798,  -0.64750171,  -0.03422313,   0.0807666 ,
         0.19575633,   0.38740589,   0.80903491,   1.19233402,
         1.26899384,   1.50718686,   1.82477755,   2.04106776,
         2.04106776,   2.17522245,   2.34770705,   2.55852156,
         2.73100616,   3.19096509,   3.34428474,   3.45927447,
         3.51676934,   3.70841889,   5.51540041,   6.5229295 ,
         6.96098563,   7.48117728,   7.5797399 ,   7.9247091 ,
         8.36550308,  10.47364819,  13.02258727]
y_train = [ 0.70423359,  0.72200438,  0.77554082,  0.70194589,  0.64435481,
        0.99989352,  0.66998084,  0.55096063,  0.50594506,  0.54125961,
        0.56752747,  0.37616882,  0.43905362,  0.5621031 ,  0.47422528,
        0.51995446,  0.47248769,  0.59128824,  0.69214369,  0.55897017,
        0.66306419,  0.61727744,  0.56153796,  0.50519954,  0.46369249,
        0.45691936,  0.60405043,  0.44529634,  0.52326453,  0.28551515,
        0.37393222,  0.6092491 ,  0.52784151,  0.61273912,  0.64628011,
        0.5829985 ,  0.76831989,  0.17223425,  1.25646104,  0.81781078,
        0.44856902,  0.55787261,  0.33272409]

As for dividing my mean - surprisingly, it gave about the same value for c (a bit lower than 1). I’m not very familiar with T processes and can’t guess what’s the possible reasons are, what do you think?

Hmmm, I cannot see any obvious problem here, but there is a clear bias in gp.TP. @bwengals?

I’m not spotting anything obvious either. Will look into it further and get back… Thanks for posting your code+data!