Latent gaussian process prediction problem

Hi again,

I’m glad you found the issue! Is the fit much better now?

A few pointers for the priors:

For the lengthscale: Usually, the most important thing is to avoid lengthscales that are too small (because those correspond to functions that vary extremely quickly, and end up overfitting to the data). Some different priors have been proposed, and the Gamma and Inverse-Gamma are usually reasonable, because they avoid a lengthscale of zero and also discourage very short ones, which aren’t great either. So the one you used isn’t too bad, really. There is a bit of discussion in the Stan manual which might be helpful also. The distributions should (almost) all be available in PyMC3 also, just be careful, sometimes the parameterisation differs.

It can sometimes be helpful to plot the priors. For example, here’s a way to plot the one you were using:

import pymc as pm # Use pymc3 if using v3, I'm using v4 which drops the number
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0.01, 5, 100)

with pm.Model() as m:

    test = pm.Gamma('l', alpha=2, beta=1)

log_probs = pm.logp(test, x).eval()
plt.plot(x, np.exp(log_probs))

You should see that the probability density drops to zero near zero, then rises, peaks, and falls again for large lengthscales. You could try some other values for alpha and beta to shift around the mode, and that could express a bit of a preference for shorter vs longer lengthscales. As your teacher says, in general a (half-)Gaussian is not a bad option, but for lengthscales I would not recommend this, because they do not rule out the lengthscale of zero which you really want to avoid.

These are just my thoughts; others may have more informed opinions! In general, I think the priors you chose are reasonable default settings, but that you may want to tweak them if you have more knowledge, e.g. if you know that the observations aren’t too noisy, as I mentioned in my first answer. In general, I would guess that the main problem with your approach was more the missing marginal variance, and the bug you found in the likelihood, than the choice of priors…

1 Like