GP and Bayesian linear regression in PyMC

nikzad · August 8, 2018, 2:48am

I have the problem of how to combine a bayesian linear regression and gaussian process. As an example, I have two features, a 2D spatial GP trying to fit a model as:

y \sim \alpha+\beta_0 x_0 + \beta_1 x_1 + GP_{spatial}

Created the following code in PyMC but still not sure about the implementation:

X = data[['x1', 'x2']].T.values
X_gp = data[['gp0', 'gp1']].values
y = data.y.values

with pm3.Model() as model_mlr:
    ## Bayesian linear regression
    α_blr = pm3.Normal('α_blr', mu=0, sd=10)
    β_blr = pm3.Normal('β_blr', mu=0, sd=1, shape=num_features)
    σ = pm3.HalfCauchy('σ', 5)
    μ_blr = α_blr + pm3.math.dot(β_blr, X)

    ## The spatial GP
    η_spatial_trend = pm3.Uniform('η_spatial_trend', lower=0, upper=100)
    ℓ_spatial_trend = pm3.Uniform('ℓ_spatial_trend', lower=0, upper=100, shape=gp_input_dim)
    cov_spatial_trend = (
        η_spatial_trend**2 * pm3.gp.cov.ExpQuad(input_dim=gp_input_dim, ls=ℓ_spatial_trend)
    )
    gp_spatial_trend = pm3.gp.Latent(cov_func=cov_spatial_trend)
    gp_spatial_func = gp_spatial_trend.prior('gp_spatial_func', X=X_gp)
    
    # noise model
    cov_noise = pm3.gp.cov.WhiteNoise(σ)

    y_hat = pm3.Normal(
        'y_hat', 
        mu=μ_blr + gp_spatial_func, 
        sd=σ, 
        observed=y)

    start = None
    step = pm3.Metropolis()
    trace_mlr = pm3.sample(30000, start=start, step=step, burn=10000)
    chain = trace_mlr[-1000:]

Should I use gp.Latent or gp.Marginal?
How prediction works in this case?
when I use gp.Latent, I get complain about not converging.

Here is a sample data

sample_data.csv (37.8 KB)

bwengals · August 8, 2018, 9:58pm

Yes you should be able to use Marginal here, since your observed variable has a normal distribution. Put your BLR code into the Linear mean function. Your model is equivalent to

y \sim \mathcal{GP}(\alpha + \beta_0 x_0 + \beta_1 x_1\,, K(x_{sp}\,, x_{sp}')) + \epsilon

where \alpha + \beta_0 x_0 + \beta_1 x_1 is the mean function. Is your data on an equispaced two dimensional grid? If so, check out the MarginalKron implementation.

nikzad · August 9, 2018, 3:17am

Thanks. I tried:

gp\_spatial\_trend = pm3.gp.Marginal(mean\_func=pm3.gp.mean.Linear(μ\_blr), cov\_func=cov\_spatial\_trend)

and then

y\_hat = gp\_spatial\_trend.marginal\_likelihood("y\_hat", X=X\_gp, y=y, noise=cov\_noise)

but getting ValueError: shapes (400,2) and (400,) not aligned: 2 (dim 1) != 400 (dim 0).
BTW, if we change from Normal distribution to Poisson, how to rewrite the original code?

bwengals · August 9, 2018, 3:50am

If you use a Poisson distribution, you will need to use Latent. The naming of marginal_likelihood refers to the fact that your observed y variable is distributed as

p(y \mid X) = \int p(y \mid f, X) p(f \mid X) df

where f is the GP being marginalized out. This integral is possible if p(y \mid f, X) is normal, since it is conjugate with p(f \mid X), which is the distribution over the Gaussian process prior. This is why Latent uses the naming ‘prior’.

Check the documentation of Linear, it’s inputs are the coefficients, and the intercept. So your call
should look more like pm.gp.mean.Linear(coeffs=beta). The X you use will need to have four columns, [x_0, x_1, x_{gp1}, x_{gp2}]. The following is pseudocode:

X = [x_0, x_1, x_{gp1}, x_{gp2}]
beta = [beta_0, beta_1, 0, 0]  # the last two columns are full of zeros
mean_func = pm.gp.mean.Linear(coeffs=beta)
# note `active_dims`.  the cov function only uses the last two columns of `X`.
# `input_dim=4` because X has four columns
cov_func = pm.gp.cov.ExpQuad(input_dim=4, ls=ls, active_dims=[2,3]) 

# for normally distributed y
gp = pm.gp.Marginal(mean_func=mean_func, cov_func=cov_func)
y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

# for poisson distributed y
gp = pm.gp.Latent(mean_func=mean_func, cov_func=cov_func)
f = gp.prior("f", X=X)
y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)

Also, I noticed you were using the Metropolis sampler. Using NUTS is highly recommended for something like this, which is what the default call of sample uses.

nikzad · August 14, 2018, 9:50pm

Thanks

nikzad · August 17, 2018, 3:35am

It was a life saving. Thanks again.
When I use Latent + Posision with NUTS (5000 samples) I receive this warning:

logp = -1,074.5, ||grad|| = 1.295: 100%|██████████| 194/194 [00:07&lt;00:00, 24.94it/s] Multiprocess 
sampling (2 chains in 2 jobs) NUTS: [f_rotated_, ℓ_spatial_trend, η_spatial_trend, σ, β_blr1, β_blr0, α_blr] 
Sampling 2 chains: 100%|██████████| 5000/5000 [1:39:06&lt;00:00, 1.19draws/s] The gelman-rubin 
statistic is larger than 1.4 for some parameters. The sampler did not converge. The estimated number of effective 
samples is smaller than 200 for some parameters.

I guess since the number of samples is high the kernel function is too big and hence sampling is hard and too time consuming (or it may need more samples). Am I right?

bwengals · August 17, 2018, 4:39pm

What was your call to sample here?

nikzad · August 18, 2018, 7:42am

Missed this part:

start = pm3.find_MAP()
step = pm3.NUTS(scaling=start)
trace_mlr = pm3.sample(2000, start=start, step=step, njobs=2)
chain_latent = trace_mlr[-1000:]

junpenglao · August 18, 2018, 9:35am

You should not use find_MAP to get initial starting point and scaling for NUTS, that usually is a very bad idea.
You should always start with the default:

with model:
    trace = pm.sample(2000, tune=1000, cores=2)

Also, burn-in and thinning is usually not needed using NUTS - if the trace looks like it needs burn-in, you should increase the tuning, if the trace looks like it needs thinning, you should further reparameterized.

Topic		Replies	Views
How do you formulate a `Latent` GP that is equivalent to the `Marginal` implementation? Questions	5	493	May 26, 2021
Marginal likelihood implementation and measurement uncertainty Questions	12	865	November 2, 2019
Multiple (uncertain) function observations of the same Gaussian process Questions	15	6546	October 16, 2017
Sparse GP with linear covariate? Questions	2	604	August 3, 2021
GP with multiplicative factor Questions	4	604	August 31, 2018

GP and Bayesian linear regression in PyMC

Related topics