Sparse GP with linear covariate?

I’m pretty new to Pymc3, and really, bayesian modelling so apologies ahead of time.

I’m interested in fitting a model which includes both a two-dimensional (spatial) gaussian process and linear coefficient for an indicator variable, with a large dataset.

I know I can do this using gp.Latent, but fitting takes a long time with large datasets because it is an O(N^3) operation.
e.g. something like

with pm.Model() as model:
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)

    cov = η ** 2 * pm.gp.cov.Matern52(1, ℓ)
    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior('f', X = X)
    
    beta_intercept = pm.Normal('intercept', mu = 0, sd = 10)
    beta_indicator = pm.Normal('beta', mu = 0, sd = 5)

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = pm.Normal("y", mu = beta_intercept + f + beta_indicator * indicator, y=y, noise=σ)

I’m intrigued by the gp.MarginalSparse construct, and wondering if it would be possible to achieve something similar with it. My first thought was to design a custom mean function for gp.MarginalSparse which would depend on my indicator variable (separate mean for state = 1 vs 0). I don’t know if this is even possible or how I would code it. Would anybody be able to steer me in the right direction or provide an example?

Thanks in advance

Yup, this is definitely possible. For your latent example here, you should be able to rewrite it in a pretty straightforward way using the Linear mean function. Here’s an example of a custom mean function being used, though I think you could use Linear.

The reason the gp has its own mean functions is to handle prediction more easily for the user. There’s no problem for fitting, but you’ll see that there will quite a bit more legwork if you wanted to predict at new X the way you have it written down.

I will say though that the type of sparse approximations in MarginalSparse likely won’t work too well on a spatial dataset. When the data is scattered relative to the lengthscale, these sparse approximations can greatly over smooth. They work best when the density of the data is very high relative to the lengthscale, like it is in this example. To see this you try modifying this example by making ℓ_true smaller and smaller (and maybe σ_true smaller as well, so the lengthscale variations are visible enough under the noise).

Not that the sparse approx isn’t worth trying, but I’d also check out this and this too.

Thanks for your help @bwengals. After looking through the links you provided and finding this helpful post, I think I figured out how to code it. The trick was being able to tell the covariance function to only use select variables with active_dims.

Following the dataset simulation from the tutorial:

np.random.seed(1)
n = 200  # The number of data points
X = 10 * np.sort(np.random.rand(n))[:, None]
ℓ_true = 1.0
η_true = 3.0
cov_func = η_true ** 2 * pm.gp.cov.Matern52(1, ℓ_true)
mean_func = pm.gp.mean.Zero()
f_true = np.random.multivariate_normal(
    mean_func(X).eval(), cov_func(X).eval() + 1e-8 * np.eye(n), 1
).flatten()
σ_true = 2.0

# Adding here a binary effect to half data
beta_true = np.where(np.arange(n) < n/2, 5, 0)
f_true = f_true + beta_true
y = f_true + σ_true * np.random.randn(n)

Fitting:

trt_status = 1 * (beta_true > 0)
Z = np.column_stack([ trt_status[:,], X])

with pm.Model() as model:
    
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)
    cov = η ** 2 * pm.gp.cov.Matern52(2, ℓ, active_dims = [1])
    
    beta = pm.Normal('beta', mu = 0, sd = 5)
    mean_fun = pm.gp.mean.Linear(coeffs = [beta,0])
    
    Xu = pm.gp.util.kmeans_inducing_points(20, Z)
    gp = pm.gp.MarginalSparse(mean_func = mean_fun, cov_func=cov, approx="FITC")

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = gp.marginal_likelihood("y_", X=Z, Xu = Xu, y=y, noise=σ)
    
    trace = pm.sample(1000)

Thanks also for the pointers about the sparse approaches and spatial data – indeed I think density will be low relative to the lengthscale

1 Like