Sparse GP with linear covariate?

Thanks for your help @bwengals. After looking through the links you provided and finding this helpful post, I think I figured out how to code it. The trick was being able to tell the covariance function to only use select variables with active_dims.

Following the dataset simulation from the tutorial:

np.random.seed(1)
n = 200  # The number of data points
X = 10 * np.sort(np.random.rand(n))[:, None]
ℓ_true = 1.0
η_true = 3.0
cov_func = η_true ** 2 * pm.gp.cov.Matern52(1, ℓ_true)
mean_func = pm.gp.mean.Zero()
f_true = np.random.multivariate_normal(
    mean_func(X).eval(), cov_func(X).eval() + 1e-8 * np.eye(n), 1
).flatten()
σ_true = 2.0

# Adding here a binary effect to half data
beta_true = np.where(np.arange(n) < n/2, 5, 0)
f_true = f_true + beta_true
y = f_true + σ_true * np.random.randn(n)

Fitting:

trt_status = 1 * (beta_true > 0)
Z = np.column_stack([ trt_status[:,], X])

with pm.Model() as model:
    
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)
    cov = η ** 2 * pm.gp.cov.Matern52(2, ℓ, active_dims = [1])
    
    beta = pm.Normal('beta', mu = 0, sd = 5)
    mean_fun = pm.gp.mean.Linear(coeffs = [beta,0])
    
    Xu = pm.gp.util.kmeans_inducing_points(20, Z)
    gp = pm.gp.MarginalSparse(mean_func = mean_fun, cov_func=cov, approx="FITC")

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = gp.marginal_likelihood("y_", X=Z, Xu = Xu, y=y, noise=σ)
    
    trace = pm.sample(1000)

Thanks also for the pointers about the sparse approaches and spatial data – indeed I think density will be low relative to the lengthscale

1 Like