Analysing gaussian data

  1. This method does not seem to have much confidence to it and I am looking for other possible suggestions on what I could do.

Here it would help to be clear what you mean by confidence. My initial guess is that you mean that your predictive distributions for held-out training points is (1) centered away from the true value, and (2) has a relatively large uncertainty interval. Is that the case?

to my understanding I can’t use the RMSE as I haven’t always got actual data to compare the predictive points to.

But you could train the GP on a subset of data, and evaluate it on that held-out subset. Do you see any clear barriers to doing that, side from extra time and computation?

I planning on looking at percentage changes between RMSE to give a numerical point at which this occurs.

If RMSE is the only metric of interest, you may get just as much mileage out of something similar and non-statistical such as a radial basis function smoother. I think the GP’s most appealing feature is its uncertainty quantification, which doesn’t necessarily get used in a root-mean-square-error calculation.

  1. If there are any other goodness of fit methods which could be appropriate

Depending on your application, you may be interested in how well calibrated the GP is. In short, this is how often the held-out values fall within the predictive credible intervals. If you calculate the 90% credible interval for the held-out value, and repeat this experiment across many held-out subsets, you would hope that the fraction of the time that the true value falls in that 90% interval is exactly 90%. The RMSE and the calibration would supply enough information about goodness-of-fit for many applications.

On another topic, I have standard deviations for my actual data set produced from the fitting and experimental process. I am currently trying to prove these are random and therefore just scalers using a runs test, however if I cannot would it be appropriate to add it to this line on the noise section or is that statistically wrong?

In principle, this is a smart thing to do if you have these error estimates error ahead of time. I would modify your model, however, to make this fit in. The code block below isn’t placing a prior over the GP function draw, it’s actually placing a prior over the additional additive noise that is tacked on in the call to gp.marginal_likelihood.

#Placing priors over the function
   s = pm.HalfCauchy("s",beta=5)

Your actual prior over that function is declared with the prior for the variable n since you are rescaling the covariance function by n**2 in the line cov = n ** 2 * pm.gp.cov.Matern52(1,l). My opinionated recommendation is to get rid of s entirely and just plug in your values of error so that it reads
y_ = gp.marginal_likelihood("y",X=x,y=y, noise=error)

This is expressing the assumption that the underlying function is smooth, and your data are noisy with the noise scale captured in error.

2 Likes