Periodic, constrained, input-error Gaussian process exhibits strange behavior at the boundary?

I’m evaluating some experimental data on the directivity of a Beverage antenna.

Power ratio measurements (constrained to [0,1]) have been recorded at multiple angles and with constant relative precision in decibels. The angle control was fairly loose: expected accurate only within \pm5^\circ. The model seems fairly simple:

import numpy as np
import pymc as pm

experiment = {"Azimuth (deg)":
              np.array([0, 11.25, 22.5, 33.75, 45, 56.25, 67.5, 78.75, 90, 101.25, 112.5,
                        123.75, 135, 146.25, 157.5, 168.75, 180, 191.25, 202.5, 213.75, 225,
                        236.25, 247.5, 258.75, 270, 281.25, 292.5, 303.75, 315, 326.25, 337.5,
                        348.75, 360]),
              "Ez/Ezmax":
              np.array([1, 0.997, 0.944, 0.841, 0.708, 0.562, 0.398, 0.355, 0.316,
                        0.316, 0.282, 0.251, 0.282, 0.282, 0.224, 0.1, 0.1, 0.1, 0.224,
                        0.282, 0.282,0.251, 0.282, 0.316, 0.316, 0.355, 0.398, 0.562, 0.708,
                        0.841, 0.944, 0.977, 1])}

with pm.Model() as model:
    # Mean roughly the data spacing.
    ell = pm.Gamma("ell", alpha=6, beta=.5)
    # Weakly informative prior
    sigma = pm.HalfNormal("sigma", sigma=5)

    # Periodic input range, obviously.
    cov = sigma**2 * pm.gp.cov.Periodic(1, 360, ell)
    gp = pm.gp.Latent(cov_func=cov)

    # Additive error - von Mises so the density of points near the boundary wraps nicely. 
    # Concentration analogous to a normal sigma = 5 deg.
    # Unit conversion hackery.
    Xerror = pm.VonMises("X_error", experiment["Azimuth (deg)"][:,None] * pi / 180 - pi ,
                             1 / (5 * pi / 180)**2, shape=experiment["Azimuth (deg)"][:,None].shape) \
                             * 180 / pi

    f = gp.prior("f", Xerror)

    # Direct power ratio equivalent of measurement error corresp. relative error of 0.2dBm
    measurement_error = 0.02.
    # Truncated normal error law to try and constrain the Gaussian process to [0,1].
    pred = pm.TruncatedNormal("y", mu=f, sigma=measurement_error \
                              * experiment["Ez/Ezmax (meas)"], lower=0, upper=1,
                              observed=experiment["Ez/Ezmax"])


    idata = pm.sample(progress_bar=False,
                      nuts_sampler="nutpie")    
fig = plt.figure(figsize=(10,4))
ax = fig.gca()
pm.gp.util.plot_gp_dist(ax, ax.extract(idata, var_names="f").transpose("sample", ...),
                        theory["Azimuth (deg)"][:,None])
ax.plot(experiment["Azimuth (deg)"][:,None], experiment["Ez/Ezmax"], "or", ms=3, label="Observed")
plt.legend()
plt.title("Weird Fit")
plt.savefig("weird_fit.svg")
plt.clf()

Up to my copying that correctly, this should produce something like:

No warnings about convergence, and by visual inspection of the plot_trace everything except the higher end of f looks fine. Posterior predictive does indicate something’s up, but that doesn’t tell me much more than what I can see above. I’m really confused about this, as the periodic kernel and abscissa distribution seems like it should eliminate any “ringing” effects from the boundary, and the default mean function should be the 0 function, so the last thing I would expect is these data causing the model to overshoot any points. It also seems much wider a conditional distribution near the boundary than the error law would predict.

I get much more reasonable results if I drop the abscissa error, but I expect it to be a fairly significant factor in how the conditional distribution looks given the variation in derivative. Any help appreciated; got no more ideas in me.

CC @lucianopaz @bwengals

Could be a couple things going on here:

  1. The lengthscale for the Periodic kernel doesn’t have the same meaning as the lengthscale for a non-periodic kernel. I’d check the prior predictives. The lengthscale should generally be a fair bit larger than the distance between datapoints.
  2. You dont have to but I were you I’d model the period and put x in radians instead of degrees. The formula for the kernel you’re using is:
k(x, x') = \mathrm{exp}\left( -\frac{\mathrm{sin}^2(\pi |x-x'| \frac{1}{T})}{2\ell^2} \right)

I just think [0, 2\pi] is a nicer scale to work on.
3. You may consider periodic splines for this. See the “Cyclic cubic regression spline basis example” from here. The spline result and the GP result may be different, but they shouldn’t conflict each other. This’ll give you a way to sanity check results too.

Isn’t this backwards on the periodic lengthscale? I haven’t checked prior predictives yet, but docs say “[d]ivide the length-scale by 2 when initializing the kernel to recover the standard definition,” which I’ve tried to do already (looking at the kernel cookbook to see that for the standard definition the lengthscale is about the data spacing). For grins, I increased the lengthscale anyway and it reduced (but did not eliminate) the overshoot, but at even higher lengthscales the trend reversed.

I cant remember. The main reason I suggested changing the lengthscale prior is because the GP is fitting though each of your datapoints, which means its probably too small. This doesnt explain the fact that its not actually being periodic.