Representing data from Gaussian process

So to check what’s going on there, compare the prior and posterior for s. I’m suspicious that it will either be unchanged or become very very small (probably the latter). I could be wrong, though, if it’s a reasonable value that’s great! But you won’t be able to make predictions with_noise, because if you have 9 observed points and you’re trying to make predictions over 100 new points, the model doesn’t know how to distribute your 9 StDevs. If you don’t care about predicting noise, just about epistemic uncertainty, then this shouldn’t matter. There’s a few different scenarios that you could try, depending on what you’re trying to actually do with the GP:

  1. Use every individual measured value (if you have access to them). As an experimentalist-turned-modeller, one of the oddly unintuitive barriers to overcome for and others around me is that you should avoid fitting on aggregated data at all costs, particularly if your data is at all non-linear. It can hide all sorts of nuances in the data, and can lead your model to perform really poorly. That might be out of your power, but I’d see if you can persuade someone to track down the original measurements.
  2. You said above “I was having issues with the sampling taking each point as have a zero error”. Presumably in that scenario you were still learning s, as in y_ = gp.marginal_likelihood("y",X=x,y=y, noise=s)? If you had noise=0 I can see why that would be problematic, but I would have thought noise=s should have been fine. You could also try an ExpQuad kernel, it’s a bit less wiggly than Matern52 so (I think) more likely to attribute small variations to noise rather than reality.
  3. You could try incorporating the observed uncertainty into y directly, by modeling y as a Normal RV with the appropriate sigma, similar to how uncertainty is incorporated into the x values in the Mauna Loa example. I haven’t tried this, so it might not even be possible, just a thought.
  4. If your standard deviations are relatively similar, or at least not systematically different, you could take the mean stdev and use it as a floor for your learned noise. I.e., y_ = gp.marginal_likelihood("y",X=x,y=y, noise=s) but where s = pm.Deterministic('s', np.mean(stds) + pm.HalfCauchy("s_",beta=5).
  5. If there is a systematic trend to your noise, you might consider implementing a Heteroskedastic GP. Essentially, that example learns two independent latent GPs, one for each of mean behavior and noise, then combines them to model the observations. For this it would still be ideal if you have access to the individual measurements. If not, you’ll probably want to have an additional likelihood that relates the noise-GP to the observed stdevs. Again, I haven’t actually tried that, so buyer beware…

Hopefully that helps!