Behaviour of Gibbs kernel seems off

I am trying to draw samples from a GP prior with Gibbs covariance - following the example in the API on the website if I just were to replace the tanh_func with a function of my own choosing (in this case I am sampling the warping function from another GP) – it is very clear that the resulting covariance K is not positive semi-definite.

(I think it might be because if you zoom into the plot for the warping function there are these feeble discontinuities – I am not sure why they arise as a monotonic transformation of a perfectly smooth function shouldn’t result in any discontinuities. )

import numpy as np
import pymc3 as pm 
import matplotlib.pyplot as plt
import theano.tensor as tt

X = np.linspace(-10, 10, 1000)[:,None]
y = (X**3 - X**2 + X)

lengthscale = 2
eta = 2.0
cov = eta**2 * pm.gp.cov.ExpQuad(1, lengthscale)
K_base = cov(X).eval()
plt.matshow(K_base)

###### Samples from Gibbs covariance

def warp_gp_func(x, K):
    sample = pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K + np.eye(1000)*1e-5, shape=(len(X),)).random(size=1).T*tt.ones(shape=(1000,1))
    return sample.exp()

def tanh_func(x, ls1, ls2, w, x0):
    """
    ls1: left saturation value
    ls2: right saturation value
    w:   transition width
    x0:  transition location.
    """
    return (ls1 + ls2) / 2.0 - (ls1 - ls2) / 2.0 * tt.tanh((x - x0) / w)

ls1 = 0.05
ls2 = 0.6
w = 0.3
x0 = 1.0
#cov = pm.gp.cov.Gibbs(1, tanh_func, args=(ls1, ls2, w, x0))
cov = pm.gp.cov.Gibbs(1, warp_gp_func, args=(K_base))

#th = tanh_func(X, ls1, ls2, w, x0).eval()
wf = warp_gp_func(X, K_base).eval()
plt.figure(figsize=(14, 4))
plt.plot(X, wf)
plt.ylabel("lengthscale")
plt.xlabel("X")
plt.title("Lengthscale as a function of X")

K = cov(X).eval()
plt.matshow(K)

### Below block throws an error

plt.figure(figsize=(14, 4))
plt.plot(X, pm.MvNormal.dist(mu=np.zeros(K.shape[0]), cov=K + np.eye(1000)*1e-6, shape=(1000,)).random(size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

image

1 Like

I’m not exactly sure what the problem is, but it seems like it might be coming from the warp function and not from the implementation of the Gibbs kernel?

One thing I tried was seeing if instead of exponentiating your samples if taking the inverse logistic would work, but this gave me the same problem. I have a sneaking suspicion it might be caused by exponentials amplifying errors somewhere.

While PyMC3’s pm.MvNormal will fail quite loudly if you try to pass a non positive semi-definite matrix as a covariance, I’ve found that NumPy’s np.random.multivariate_normal will still produce samples but will just warn you. I’m not sure if the mathematics breaks in a meaningful way This definitely breaks the mathematics but if I replace your last block of code with the following I’m able to draw samples:

plt.figure(figsize=(14, 4))
plt.plot(X, np.random.multivariate_normal(mean=np.zeros(K.shape[0]), cov=K + np.eye(1000)*1e-9, size=3).T)
plt.title("Samples from the GP prior")
plt.ylabel("y")
plt.xlabel("X");

Hope this helps!

2 Likes

I wasn’t able to figure out why your example as written didn’t work. It looks like this is using exp(GP) for the lengthscale of another GP. Does this give something sensible for samples from the GP prior?:

import numpy as np
import pymc3 as pm 
import matplotlib.pyplot as plt
import theano.tensor as tt

X = np.linspace(-10, 10, 300)[:,None]  # i changed the 1000 to 300 to speed it up here

with pm.Model() as model:
    
    eta = 2.0
    lengthscale = 2.0
    base_cov = eta**2 * pm.gp.cov.ExpQuad(1, lengthscale)
    base_gp = pm.gp.Latent(cov_func=base_cov)
    
    log_wf = base_gp.prior("log_wf", X=X)
    wf = pm.Deterministic("wf", pm.math.exp(log_wf))
    
    def warp_gp_func(x, wf):
        return wf
    
    cov = pm.gp.cov.Gibbs(1, warp_gp_func, args=(wf,))
    
    K = pm.Deterministic("K", cov(X))
    
    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior("f", X=X)

with model:
    tr = pm.sample(tune=1000, draws=100, cores=1)

The Gibbs covariance matrices here at least look symmetric. Is this kind of what you’re after?

2 Likes

thank you, the behaviour of np.random.multivariate_normal is a bit worrying because even if I can sample …the fact that the matrix isn’t psd might break something downstream. :frowning:

1 Like

thank you, this seems reasonable!

1 Like

Hi @bwengals so the solution you presented trains ok but - prediction fails as I think it is not able to extrapolate the lengthscale function to X_new points and we need that as the conditional distribution needs to compute the cross covariance terms like K_{*n} → covar between new inputs and training.

  X = np.linspace(-1, 1, 100)[:,None]
    y = (X**3 - X**2 + X)
    
    X_new = np.linspace(-1, 1, 1000)[:,None]

    with pm.Model() as model:
    
        eta_prior = pm.Gamma('eta_prior', alpha=2, beta=0.75)
        l_prior = pm.Gamma('l_prior', alpha=2, beta=0.75)
        base_cov = eta_prior**2 * pm.gp.cov.ExpQuad(1, ls=l_prior)
        base_gp = pm.gp.Latent(cov_func=base_cov) 
        log_wf = base_gp.prior("log_wf", X=X)
        wf = pm.Deterministic("wf", pm.math.exp(log_wf))
        
        def warp_gp_func(x, wf):
            return wf
        
        cov = Gibbs(1, warp_gp_func, args=(wf,))
        
        K = pm.Deterministic("K", cov(X))
        
        gp = pm.gp.Marginal(cov_func=cov)
        #Xu = pm.gp.util.kmeans_inducing_points(20, X)
        #f = gp.prior("f", X=X)
        
        #sigma = pm.Gamma('sigma', alpha=1, beta=2)
        #trace_prior = pm.sample(draws=500, return_inferencedata=False)
        
        y_ = gp.marginal_likelihood("y", X=X, y=y, noise=0.01)
        
        trace = pm.sample(draws=100, tune=500, chains=1)
        
    with model:
         f_pred = gp.conditional("f_pred", X_new)
         pred_samples = pm.sample_posterior_predictive(trace, vars=[f_pred], samples=1000)
Traceback (most recent call last):

  Input In [95] in <cell line: 1>
    f_pred = gp.conditional("f_pred", X_new)

  File ~/anaconda3/lib/python3.8/site-packages/pymc3/gp/gp.py:527 in conditional
    mu, cov = self._build_conditional(Xnew, pred_noise, False, *givens)

  File ~/anaconda3/lib/python3.8/site-packages/pymc3/gp/gp.py:473 in _build_conditional
    Kxs = self.cov_func(X, Xnew)

  File ~/anaconda3/lib/python3.8/site-packages/pymc3/gp/cov.py:82 in __call__
    return self.full(X, Xs)

  File ~/Desktop/Workspace/Kernel_Learning_Latent_GPs/kernels/gibbs1d_pymc3:67 in full
    return at.sqrt((2.0 * at.outer(rx, rz)) / (rx2 + rz2)) * at.exp(-1.0 * r2 / (rx2 + rz2))

  File ~/anaconda3/lib/python3.8/site-packages/theano/tensor/var.py:170 in __truediv__
    return theano.tensor.basic.true_div(self, other)

  File ~/anaconda3/lib/python3.8/site-packages/theano/graph/op.py:253 in __call__
    compute_test_value(node)

  File ~/anaconda3/lib/python3.8/site-packages/theano/graph/op.py:130 in compute_test_value
    required = thunk()

  File ~/anaconda3/lib/python3.8/site-packages/theano/graph/op.py:606 in rval
    thunk()

  File ~/anaconda3/lib/python3.8/site-packages/theano/link/c/basic.py:1771 in __call__
    raise exc_value.with_traceback(exc_trace)

ValueError: Input dimension mis-match. (input[0].shape[1] = 1000, input[1].shape[1] = 100)
1 Like

Hmm, I think I understand. I think this might be tricky to do out of the box here, since warp_gp_func as it is now has to be a function. I know it’s a bit of a bummer, but if I were you, I’d try writing a new covariance class using Gibbs as a starting point. I’d get rid of warp_func as an input, and try and construct the lengthscale GP within your GibbsGP class. Whether you are training or extrapolating is basically determined in the Covariance by whether Xs is set. Maybe that will give you enough flexibility?