Parameter sampling

I’m writing up a report on some coding I have done and I just want to check what I am saying is correct and I have the right context for what the code below does

I run a Matern32 covariance function, which has parameter l defining the length scale which is a Gamma kernel, and then the noise is modelled as a Half Cauchy (because the noise is always positive) function and is used as a scalar for the covariance function.

I have swapped to the pm.sample and using this to estimate parameters instead of the MAP function, but I am not exactly sure what the sample function does.

I am a bit unsure what the parameter s represents so any input on this would be good, I believe it models the noise on the data training points supplied to the model.

If anyone can just help explain the process of what these parameters mean and how they are being estimated that would be very helpful as I’m currently getting confused.

# new x values for inferring the data in that range 
# linspace(start,stop,num)
X_new = numpy.linspace(-50, 50, 100)[:, None]

with pm.Model() as model:
    #shape
    l = pm.Gamma("l", alpha=2, beta=1)
    #Noise
    n = pm.HalfCauchy("n",beta=5)
    #Covariance
    cov = n ** 2 * pm.gp.cov.Matern32(1,l)
    #Specify the GP, the default mean function is zero in this case as not specified
    gp = pm.gp.Marginal(cov_func=cov)

    s = pm.HalfCauchy("s",beta=5)
    #Marginal likelihood method fits the model
    y_ = gp.marginal_likelihood("y",X=x,y=y, noise=s)
    #Find the parameters that best fit the data
    #mp = pm.find_MAP()
    trace = pm.sample()
    #.conditional distribition for predictions given the X_new values 
    f_pred = gp.conditional("f_pred",X_new)
    
    #Predict the distribution of samples on the new x values
    pred_samples = pm.sample_posterior_predictive(trace, var_names= ['f_pred'],samples=2000)

When s is zero, you are assuming that the modeled function f passes exactly through the points y. When it’s nonzero, you are allowing for the values of y to come off of f, and the amount that they’re allowed to do so is controlled by the size of s.

As far as the lengthscale l is concerned, you need to make sure that most of the probability mass for the prior on l is on feasible distances between points. If the largest interpoint distance between x coordinates is 5 and the smallest is 1, then you would want a prior that puts most of its mass between those two values, like a uniform over [0,1] or similar. The gamma prior could be perfectly fine for that, but you may want to plot its PDF to make sure that it is supported on distances actually present in your data.

As far as sampling is concerned, there’s quite a bit to unpack. Here’s a blog post that may help and here’s a visualization that shows how the NUTS algorithm (the one being used in your code) traverses the posterior distribution.

3 Likes

Thank you! one last thing

Currently I model s with a Half Cauchy prior just to account for the noise being positive. But I do actually have the actual standard deviation of the data training points from the fitting process. Would it be statistically correct, and perhaps more accurate to include the standard devatiation^2 instead of this s function?

Yes, you can go ahead and use that - that’s making use of what we might call a plug-in estimator of the true variance. Here, I think you want to supply it as noise=s rather than noise=s**2 because the marginal likelihood method expects a standard deviation rather than a variance.