# Coregionalization model for two separable multidimensional Gaussian Process

Dear Community,
I want to model a joint Gaussian Process with two response variables using a linear combination of two dimensional covariance functions (i.e a linear Coregionalization spatial process Or a bivariate geostatistical model) .

My idea is similar to this one: Multi-output gaussian processes. However, I have had problems with the “multiplication” because, as I understand it uses the kronecker product.
The model I’m trying to implement is described in (Schmidt, Gelfand eq. 1)

The model I want to implement is the following:
Let S1 and S2 two gp with covariance = gp.cov.Matern32(input_dims=2)
And then use the sum of two kronecker products of the covariance matrix

Now, I first tried to use a tt.tensor array (2x2) to act as the coregionalization matrix.
Then produce the Kronnecker product (covariance function) and sum a similar function

This is the code:

``````with pm.Model() as model:
## Building the covariance structure
sigma_p = pm.HalfNormal('sigma_p',sd=3)
#phi = pm.Uniform('phi',0,15)
phi_p = pm.HalfNormal('phi_p',sd=1)
cov_p= sigma_p * pm.gp.cov.Matern32(2,phi_p,active_dims=[0,1])

## Building the covariance structure
sigma_s = pm.HalfNormal('sigma_s',sd=3)
#phi = pm.Uniform('phi',0,15)
phi_s = pm.HalfNormal('phi_s',sd=1)
cov_s= sigma_s * pm.gp.cov.Matern32(2,phi_s,active_dims=[0,1])

## Build corregionalization matrix
A = pm.Normal(name='a',mu=0,sd=1,shape=(2,2))
### Letś try to do the exercise with kronnecker prod

fl1 = [A,cov_p]
fl2 = [A,cov_s]
K1 = pm.gp.cov.Kron(factor_list=fl1)
K2 = pm.gp.cov.Kron(factor_list=fl2)
K = K1 + K2

gp = pm.gp.Latent(cov_func=K)

f = gp.prior("latent_field", X=TX.values,reparameterize=False)

y_obs = pm.Bernoulli('y_obs',logit_p=f,observed=Y)
``````

However I had an error saying that the RV_Free variable has no attribute “input_dims” meaning that the only way to make the gp.cov.Kron to work is with a factor_list argument composed of ONLY covariance functions (i.e. gp.cov.Covariance).

I tried then to use the function gp.cov.Coregion however it seems to work only when input_dims=1 i.e. it only accepts 1d datasets.
ValueError: Coregion requires exactly one dimension to be active

So my question is, Do you think that its possible to achieve this with the current implementation of the GP API?

Do I need to extend it to cover multiple dimensional linear corregionalisations ?

The `gp.cov.Coregion` covariance is required to have `input_dim=1` because the X variable it takes as input
is the index of the elements of the matrix `B`. This indexing constructs the linear combinations that underlie the coregionalization model. There’s no restrictions on the number of inputs for the underlying process covariance matrix, so your `cov_p` and `cov_s` should be OK. You should add a column to your `X` that is the index of which of the outputs the corresponding `y` variable belongs to, and set that as the active_dim for `gp.cov.Coregion`.

1 Like

Thank you @bwengals . However, I haven’t make it work. It appears to me that it’s not possible to use the CoRegion together with the kronecker product.
This is the code that I’m trying to run.

phi_pt = 40
cov_p= pm.gp.cov.Matern32(2,phi_pt,active_dims=[0,1])
phi_st = 15
cov_s= pm.gp.cov.Matern32(2,phi_st,active_dims=[2,3])

##coregionalization matrix
a = np.random.normal(size=4).reshape(2,2)
A = pm.gp.cov.Coregion(2,active_dims=,B=a)
K1 = pm.gp.cov.Kron(factor_list=[cov_p,A])

I’ve concatenated the coordinates and the outputs in corresponding columns.
i.e. Outputs y_1 and y_2 into Y and coords_1 and coords_2 into coords. I’ve create a new column idx indicating 0 when y belongs to y_1 and 1 when y belongs to y_2.
However, I’ve found that the spliting method in the gp.cov.Kron is not working properly with this configuration. I am getting an IndexError when evaluating K1(X).eval()
X has shape (456,5)
I’ve also duplicated the coords column to be consistent with the Kron covariance function.

So it looks like a Kronecker based implementation isn’t in PyMC3 (PR welcome!), but you can implement it without in a way that doesn’t take advantage of that computational speedup. Here is an example that should help get you going of a coregionalization GP model. Notice that you won’t have any issues with multidimensional inputs, they’ll work as normally.

#### generate and then plot the data

``````import numpy as np
import pymc3 as pm

# set the seed
np.random.seed(1)

n = 100
x = np.linspace(0, 10, n)[:, None]

# true covariance function and its true parameters
ell_true = 1.0
eta_true = 3.0
cov_func = eta_true**2 * pm.gp.cov.Matern52(1, ℓ_true)
mean_func = pm.gp.mean.Zero()

# two samples from the same gaussian process
f_true1 = np.random.multivariate_normal(mean_func(x).eval(),
cov_func(x).eval() + 1e-8*np.eye(n), 1).flatten()
f_true2 = np.random.multivariate_normal(mean_func(x).eval(),
cov_func(x).eval() + 1e-8*np.eye(n), 1).flatten()

sigma_true = 2.0
f1 = 0.5*f_true1 - 1.0*f_true2
f2 = 1.0*f_true1 + 0.2*f_true2
y1 = f1 + sigma_true * np.random.randn(n)
y2 = f2 + sigma_true * np.random.randn(n)

## Plot the data and the unobserved latent function
fig = plt.figure(figsize=(12,5)); ax = fig.gca()
ax.plot(x, f1, "dodgerblue", lw=3, label="True f1");
ax.plot(x, f2, "tomato", lw=3, label="True f2");
ax.plot(x, y1, 'ok', ms=3, alpha=0.5, label="Data 1");
ax.plot(x, y2, 'ok', ms=3, alpha=0.5, label="Data 2");
ax.set_xlabel("x"); ax.set_ylabel("The true f(x)"); plt.legend();
``````

#### Format the data for the coregionalized GP

``````xx = np.concatenate((x, x), axis=0)
idx = np.concatenate((np.zeros(n), np.ones(n)))[:,None]
X = np.concatenate((xx, idx), axis=1)

y = np.concatenate((y1, y2))
X.shape, y.shape
``````

#### pymc3 model

``````with pm.Model() as model:
eta = pm.Gamma("eta", alpha=2, beta=0.5)
ell = pm.Gamma("ell", alpha=2, beta=0.5)
cov = eta**2 * pm.gp.cov.ExpQuad(1, ls=ell, active_dims=)

W = pm.Normal("W", mu=0, sd=3, shape=(2,2), testval=np.random.randn(2,2))
kappa = pm.Gamma("kappa", alpha=1.5, beta=1, shape=2)
coreg = pm.gp.cov.Coregion(input_dim=2, active_dims=, kappa=kappa, W=W)
cov_func = coreg * cov

sigma = pm.HalfNormal("sigma", sd=3)
gp = pm.gp.Marginal(cov_func=cov_func)
y_ = gp.marginal_likelihood("f", X, y, noise=sigma)
``````

#### MAP estimate and generate some predictions

``````with model:
mp = pm.find_MAP()

x_new = np.linspace(0, 20, 200)[:,None]
xx_new = np.concatenate((x_new, x_new), axis=0)
idx2 = np.concatenate((np.zeros(200), np.ones(200)))[:,None]
X_new = np.concatenate((xx_new, idx2), axis=1)

with model:
f_pred = gp.conditional("f_pred", X_new)

# To use the MAP values, you can just replace the trace with a length-1 list with `mp`
with model:
pred_samples = pm.sample_ppc([mp], vars=[f_pred], samples=300)
``````

#### plot

``````from pymc3.gp.util import plot_gp_dist
fig = plt.figure(figsize=(12,5)); ax = fig.gca()

# plot the samples from the gp posterior with samples and shading
from pymc3.gp.util import plot_gp_dist

f_pred = pred_samples["f_pred"]
plot_gp_dist(ax, f_pred[:, :200], X_new[:200,0], palette="Blues", fill_alpha=0.1, samples_alpha=0.1);
plot_gp_dist(ax, f_pred[:, 200:], X_new[200:,0], palette="Reds", fill_alpha=0.1, samples_alpha=0.1);
``````

5 Likes