Gaussian process using noisy data training points

Of course, I have included my code below and a screenshot of the plot produced

I do believe it is something related to the best noise parameter being considerably smaller, meaning it is easier to pass through the data points then to smooth out and assume there is noise. However I also think there is an additional coding issue as when trying to add in the additional data uncertainty this code behaves very differently to the full data set.

data = pandas.read_csv('exported_stress_fsw_311.csv')
data = data[data['load']==0]
data = data[data['y']==0]
data = data[::2]
x = data['x'].to_numpy()
x = x[:,None]
y = data['s11'].to_numpy()

X_new = numpy.linspace(-50, 50, 100)[:, None]

with pm.Model() as model:
    #shape
    l = pm.Gamma("l", alpha=2, beta=1)
    #Noise
    n = pm.HalfCauchy("n",beta=5)
    #Covariance
    cov = n ** 2 * pm.gp.cov.Matern52(1,l)
    #Specify the GP, the default mean function is zero in this case as not specified
    gp = pm.gp.Marginal(cov_func=cov)
    #Placing priors over the function
    s = pm.HalfCauchy("s",beta=5)
    #Marginal likelihood method fits the model
    y_ = gp.marginal_likelihood("y",X=x,y=y, noise=s)
    #Find the parameters that best fit the data
    mp = pm.find_MAP()
    #.conditional distribition for predictions given the X_new values 
    f_pred = gp.conditional("f_pred",X_new)
    #Predict the distribution of samples on the new x values
    pred_samples = pm.sample_posterior_predictive([mp], var_names= ['f_pred'],samples=2000)

As a comparison this is the plot produced when I use the entire data set

As I said previously I also have an additional error function from my data set which I can add to the noise function, which seems to work for the full data set (just makes the uncertainty larger) but for the partial data set it cause the plot to completely mess up. Thank you!

1 Like