# GP and Bayesian linear regression in PyMC

I have the problem of how to combine a bayesian linear regression and gaussian process. As an example, I have two features, a 2D spatial GP trying to fit a model as:

y \sim \alpha+\beta_0 x_0 + \beta_1 x_1 + GP_{spatial}

Created the following code in PyMC but still not sure about the implementation:

X = data[['x1', 'x2']].T.values
X_gp = data[['gp0', 'gp1']].values
y = data.y.values

with pm3.Model() as model_mlr:
## Bayesian linear regression
α_blr = pm3.Normal('α_blr', mu=0, sd=10)
β_blr = pm3.Normal('β_blr', mu=0, sd=1, shape=num_features)
σ = pm3.HalfCauchy('σ', 5)
μ_blr = α_blr + pm3.math.dot(β_blr, X)

## The spatial GP
η_spatial_trend = pm3.Uniform('η_spatial_trend', lower=0, upper=100)
ℓ_spatial_trend = pm3.Uniform('ℓ_spatial_trend', lower=0, upper=100, shape=gp_input_dim)
cov_spatial_trend = (
)
gp_spatial_trend = pm3.gp.Latent(cov_func=cov_spatial_trend)
gp_spatial_func = gp_spatial_trend.prior('gp_spatial_func', X=X_gp)

# noise model
cov_noise = pm3.gp.cov.WhiteNoise(σ)

y_hat = pm3.Normal(
'y_hat',
mu=μ_blr + gp_spatial_func,
sd=σ,
observed=y)

start = None
step = pm3.Metropolis()
trace_mlr = pm3.sample(30000, start=start, step=step, burn=10000)
chain = trace_mlr[-1000:]


• Should I use gp.Latent or gp.Marginal?

• How prediction works in this case?
when I use gp.Latent, I get complain about not converging.

Here is a sample data

sample_data.csv (37.8 KB)

Yes you should be able to use Marginal here, since your observed variable has a normal distribution. Put your BLR code into the Linear mean function. Your model is equivalent to

y \sim \mathcal{GP}(\alpha + \beta_0 x_0 + \beta_1 x_1\,, K(x_{sp}\,, x_{sp}')) + \epsilon

where \alpha + \beta_0 x_0 + \beta_1 x_1 is the mean function. Is your data on an equispaced two dimensional grid? If so, check out the MarginalKron implementation.

Thanks. I tried:

gp\_spatial\_trend = pm3.gp.Marginal(mean\_func=pm3.gp.mean.Linear(μ\_blr), cov\_func=cov\_spatial\_trend)

and then

y\_hat = gp\_spatial\_trend.marginal\_likelihood("y\_hat", X=X\_gp, y=y, noise=cov\_noise)

but getting ValueError: shapes (400,2) and (400,) not aligned: 2 (dim 1) != 400 (dim 0).
BTW, if we change from Normal distribution to Poisson, how to rewrite the original code?

If you use a Poisson distribution, you will need to use Latent. The naming of marginal_likelihood refers to the fact that your observed y variable is distributed as

p(y \mid X) = \int p(y \mid f, X) p(f \mid X) df

where f is the GP being marginalized out. This integral is possible if p(y \mid f, X) is normal, since it is conjugate with p(f \mid X), which is the distribution over the Gaussian process prior. This is why Latent uses the naming ‘prior’.

Check the documentation of Linear, it’s inputs are the coefficients, and the intercept. So your call
should look more like pm.gp.mean.Linear(coeffs=beta). The X you use will need to have four columns, [x_0, x_1, x_{gp1}, x_{gp2}]. The following is pseudocode:

X = [x_0, x_1, x_{gp1}, x_{gp2}]
beta = [beta_0, beta_1, 0, 0]  # the last two columns are full of zeros
mean_func = pm.gp.mean.Linear(coeffs=beta)
# note active_dims.  the cov function only uses the last two columns of X.
# input_dim=4 because X has four columns

# for normally distributed y
gp = pm.gp.Marginal(mean_func=mean_func, cov_func=cov_func)
y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

# for poisson distributed y
gp = pm.gp.Latent(mean_func=mean_func, cov_func=cov_func)
f = gp.prior("f", X=X)
y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)


Also, I noticed you were using the Metropolis sampler. Using NUTS is highly recommended for something like this, which is what the default call of sample uses.

1 Like

Thanks

It was a life saving. Thanks again.
When I use Latent + Posision with NUTS (5000 samples) I receive this warning:

logp = -1,074.5, ||grad|| = 1.295: 100%|██████████| 194/194 [00:07&lt;00:00, 24.94it/s] Multiprocess
sampling (2 chains in 2 jobs) NUTS: [f_rotated_, ℓ_spatial_trend, η_spatial_trend, σ, β_blr1, β_blr0, α_blr]
Sampling 2 chains: 100%|██████████| 5000/5000 [1:39:06&lt;00:00, 1.19draws/s] The gelman-rubin
statistic is larger than 1.4 for some parameters. The sampler did not converge. The estimated number of effective
samples is smaller than 200 for some parameters.


I guess since the number of samples is high the kernel function is too big and hence sampling is hard and too time consuming (or it may need more samples). Am I right?

What was your call to sample here?

Missed this part:

start = pm3.find_MAP()
step = pm3.NUTS(scaling=start)
trace_mlr = pm3.sample(2000, start=start, step=step, njobs=2)
chain_latent = trace_mlr[-1000:]


You should not use find_MAP to get initial starting point and scaling for NUTS, that usually is a very bad idea.
with model: