Latent gaussian process prediction problem

Hi,

I am trying to replicate the latent Gaussian process based on doc: Latent Variable Implementation — PyMC3 3.11.4 documentation.

The main question I have is why my prediction results (based on sampling) are very different from the dataset.

This is my generated dataset

This is my gaussian process prediction

The full code is implemented in colab.

#define the model
gp_model = pm.Model()

#define the GP prior
with gp_model:
  # Specify the covariance function.
  l = pm.Gamma("l", alpha=2, beta=1)
  cov_func = pm.gp.cov.Matern52(1, ls=l)
  # Specify the GP.  The default mean function is `Zero`.
  gp = pm.gp.Latent(cov_func=cov_func)
  # Place a GP prior over the function f.
  f = gp.prior("f", X=x[:,None])

#define the likelihood
with gp_model:
  #specify the noise sigma
  sigma = pm.HalfCauchy("sigma", beta=5)
  #specify the likelihood
  y_observed = pm.Normal("y_observed", mu=f, sigma=sigma, observed=y[:,None])

#make inference based on MCMC
with gp_model:
  #specify the sampler
  step = pm.HamiltonianMC()
  #sample the posterior distribution
  trace = pm.sample(2000, step=step, return_inferencedata=False)

#buring and thining trace
BURNING = 1000
THINING = 2
trace = trace[BURNING::THINING]

with gp_model:
  pm.plot_trace(trace, figsize=(5, 5))
![image|507x500](upload://rO8S9IKoXCeTgInWHoqQNZGix4W.jpeg)


x_star = np.linspace(0, xmax, 100)[:, None]
with gp_model:
  f_star = gp.conditional("f_star", x_star)

with gp_model:
  prediction = pm.sample_posterior_predictive(trace, var_names=['f_star'])

plt.scatter(x, y)
plt.plot(x_star, prediction['f_star'].mean(axis=0))
plt.errorbar(x_star, prediction['f_star'].mean(axis=0), np.sqrt(prediction['f_star'].var(axis=0)))

Any help is appreciated!!

2 Likes

Hi,

It looks like what’s happened here is that the posterior is concentrated around a large lengthscale l (you can see its mode is around 5), and the value for sigma – the part of the data explained by noise – is large, at just under 2. Putting these two things together, it’s predicting an almost flat function (large lengthscale), and almost all of the variation as being due to noise (large sigma).

Here are some ideas to avoid this from happening:

  • First off, it seems like you’re not estimating the marginal variance of the GP. That could be a problem, because if I’m reading the docs correctly, that means you’re assuming the GP has a marginal variance of 1, which may be too small. If the GP has too small a variance to explain the data, the solution that it’s all due to noise becomes more likely. Check out the code in the example you linked for how to estimate the marginal variance. The parameter of interest is η, as seen here:
    η = pm.HalfCauchy("η", beta=1)
    cov = η ** 2 * pm.gp.cov.Matern52(1, ℓ)

This is in the code just under “Coding the model in PyMC3”. Multiplying the covariance function in this way will give you a marginal variance of \eta^2, which will be inferred together with the rest of the model.

  • Secondly, your prior for sigma has a large beta value, allowing it to be large. If you want to discourage explaining the data with noise, you could make that quite a bit smaller. That indicates your prior belief that the function is observed only with a small amount of noise.
  • Similarly, the prior on the lengthscale will discourage small lengthscales, but it looks to me like the one in the generated data is actually quite small. You could try to put e.g. a HalfCauchy or HalfNormal on it too. But I think your current prior ought to be OK also.
  • Finally, it looks like you’re using HamiltonianMC. That could be an issue too; I’m not sure how it’s tuned in PyMC3, but NUTS is usually a lot more reliable. If you don’t specify step in pm.sample, I believe NUTS is the default, so I’d recommend doing that. I would also recommend using the arguments for tune and thin in pm.sample rather than doing it manually yourself.

Hope this helps; curious to hear how it goes!

Best wishes,
Martin

4 Likes

Hi Martin,

Thank you for your concrete answers. It turns out that I have falsly specify the y value.
y_observed = pm.Normal(“y_observed”, mu=f, sigma=sigma, observed=y[:,None])
it should be: y_observed = pm.Normal(“y_observed”, mu=f, sigma=sigma, observed=y)

Following with your suggestions, I am wondering if you have any suggestions for distributions to specify the prior. What is good priors for small lengthscales and what is good priors for large lengthscales? From my class, the teacher usually says a Gaussian/tructed Gaussian is good choice,but I get lost in so much prior distributions I can choose from in PYMC3.

Thank you very much!!!

Hi again,

I’m glad you found the issue! Is the fit much better now?

A few pointers for the priors:

For the lengthscale: Usually, the most important thing is to avoid lengthscales that are too small (because those correspond to functions that vary extremely quickly, and end up overfitting to the data). Some different priors have been proposed, and the Gamma and Inverse-Gamma are usually reasonable, because they avoid a lengthscale of zero and also discourage very short ones, which aren’t great either. So the one you used isn’t too bad, really. There is a bit of discussion in the Stan manual which might be helpful also. The distributions should (almost) all be available in PyMC3 also, just be careful, sometimes the parameterisation differs.

It can sometimes be helpful to plot the priors. For example, here’s a way to plot the one you were using:

import pymc as pm # Use pymc3 if using v3, I'm using v4 which drops the number
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0.01, 5, 100)

with pm.Model() as m:

    test = pm.Gamma('l', alpha=2, beta=1)

log_probs = pm.logp(test, x).eval()
plt.plot(x, np.exp(log_probs))

You should see that the probability density drops to zero near zero, then rises, peaks, and falls again for large lengthscales. You could try some other values for alpha and beta to shift around the mode, and that could express a bit of a preference for shorter vs longer lengthscales. As your teacher says, in general a (half-)Gaussian is not a bad option, but for lengthscales I would not recommend this, because they do not rule out the lengthscale of zero which you really want to avoid.

These are just my thoughts; others may have more informed opinions! In general, I think the priors you chose are reasonable default settings, but that you may want to tweak them if you have more knowledge, e.g. if you know that the observations aren’t too noisy, as I mentioned in my first answer. In general, I would guess that the main problem with your approach was more the missing marginal variance, and the bug you found in the likelihood, than the choice of priors…

1 Like