GP and Bayesian linear regression in PyMC


#1

I have the problem of how to combine a bayesian linear regression and gaussian process. As an example, I have two features, a 2D spatial GP trying to fit a model as:

y \sim \alpha+\beta_0 x_0 + \beta_1 x_1 + GP_{spatial}

Created the following code in PyMC but still not sure about the implementation:

X = data[['x1', 'x2']].T.values
X_gp = data[['gp0', 'gp1']].values
y = data.y.values

with pm3.Model() as model_mlr:
    ## Bayesian linear regression
    α_blr = pm3.Normal('α_blr', mu=0, sd=10)
    β_blr = pm3.Normal('β_blr', mu=0, sd=1, shape=num_features)
    σ = pm3.HalfCauchy('σ', 5)
    μ_blr = α_blr + pm3.math.dot(β_blr, X)

    ## The spatial GP
    η_spatial_trend = pm3.Uniform('η_spatial_trend', lower=0, upper=100)
    ℓ_spatial_trend = pm3.Uniform('ℓ_spatial_trend', lower=0, upper=100, shape=gp_input_dim)
    cov_spatial_trend = (
        η_spatial_trend**2 * pm3.gp.cov.ExpQuad(input_dim=gp_input_dim, ls=ℓ_spatial_trend)
    )
    gp_spatial_trend = pm3.gp.Latent(cov_func=cov_spatial_trend)
    gp_spatial_func = gp_spatial_trend.prior('gp_spatial_func', X=X_gp)
    
    # noise model
    cov_noise = pm3.gp.cov.WhiteNoise(σ)

    y_hat = pm3.Normal(
        'y_hat', 
        mu=μ_blr + gp_spatial_func, 
        sd=σ, 
        observed=y)

    start = None
    step = pm3.Metropolis()
    trace_mlr = pm3.sample(30000, start=start, step=step, burn=10000)
    chain = trace_mlr[-1000:]

  • Should I use gp.Latent or gp.Marginal?

  • How prediction works in this case?
    when I use gp.Latent, I get complain about not converging.

Here is a sample data

sample_data.csv (37.8 KB)


#2

Yes you should be able to use Marginal here, since your observed variable has a normal distribution. Put your BLR code into the Linear mean function. Your model is equivalent to

y \sim \mathcal{GP}(\alpha + \beta_0 x_0 + \beta_1 x_1\,, K(x_{sp}\,, x_{sp}')) + \epsilon

where \alpha + \beta_0 x_0 + \beta_1 x_1 is the mean function. Is your data on an equispaced two dimensional grid? If so, check out the MarginalKron implementation.


#3

Thanks. I tried:

gp\_spatial\_trend = pm3.gp.Marginal(mean\_func=pm3.gp.mean.Linear(μ\_blr), cov\_func=cov\_spatial\_trend)

and then

y\_hat = gp\_spatial\_trend.marginal\_likelihood("y\_hat", X=X\_gp, y=y, noise=cov\_noise)

but getting ValueError: shapes (400,2) and (400,) not aligned: 2 (dim 1) != 400 (dim 0).
BTW, if we change from Normal distribution to Poisson, how to rewrite the original code?


#4

If you use a Poisson distribution, you will need to use Latent. The naming of marginal_likelihood refers to the fact that your observed y variable is distributed as

p(y \mid X) = \int p(y \mid f, X) p(f \mid X) df

where f is the GP being marginalized out. This integral is possible if p(y \mid f, X) is normal, since it is conjugate with p(f \mid X), which is the distribution over the Gaussian process prior. This is why Latent uses the naming ‘prior’.

Check the documentation of Linear, it’s inputs are the coefficients, and the intercept. So your call
should look more like pm.gp.mean.Linear(coeffs=beta). The X you use will need to have four columns, [x_0, x_1, x_{gp1}, x_{gp2}]. The following is pseudocode:

X = [x_0, x_1, x_{gp1}, x_{gp2}]
beta = [beta_0, beta_1, 0, 0]  # the last two columns are full of zeros
mean_func = pm.gp.mean.Linear(coeffs=beta)
# note `active_dims`.  the cov function only uses the last two columns of `X`.
# `input_dim=4` because X has four columns
cov_func = pm.gp.cov.ExpQuad(input_dim=4, ls=ls, active_dims=[2,3]) 

# for normally distributed y
gp = pm.gp.Marginal(mean_func=mean_func, cov_func=cov_func)
y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

# for poisson distributed y
gp = pm.gp.Latent(mean_func=mean_func, cov_func=cov_func)
f = gp.prior("f", X=X)
y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)

Also, I noticed you were using the Metropolis sampler. Using NUTS is highly recommended for something like this, which is what the default call of sample uses.