# Multivariate Gaussian Process Classification (GPC) and Prediction

I’m trying to implement a binary classification problem with multi-dimensional inputs using something like the following code:

``````import pymc3 as pm
import scipy
import numpy as np
import theano.tensor as tt

n = 40 #number of observations
N = 10 #number of predictors

#make up some fake data
y = np.random.binomial(1, scipy.special.expit(np.random.normal(size=n)))
X = np.random.normal(size=(n+1,N))
X_new = X[-1,:] #data used for prediction
X = X[:-1,:]

#model setup
with pm.Model() as model:
logl = pm.Uniform('log(l)', lower=np.log(1e-5), upper=np.log(1e5))

gp = pm.gp.Latent(cov_func=cov)
f = gp.prior('f', X=X)
p = pm.Deterministic('p', pm.math.invlogit(f))
y_ = pm.Bernoulli('y', p=p, observed=y)
trace = pm.sample(draws=2000, chains=2, cores=2, random_seed=1)
``````

I would now like to predict a new value of `y`, given the unseen data `X_new`, lets call it `y_new`. Hence I would expect the value of `y_new` to be either 0 or 1 in this case. As well as this, I would like to estimate the probability of this value of `y_new` (the same as the `predict_proba` functionality in `sklearn.gaussian_process.GaussianProcessClassifier`). Is this possible? I have read around a bit and I think I need to do something like:

``````with model:
fcond = gp.conditional('fcond', X_new)
ppc = pm.sample_posterior_predictive(trace=trace, vars=[fcond], samples=1000)
``````

but I don’t really know how to interpret the output `ppc['fcond']`.

Many thanks.

`sample_posterior_predictive` will be generating simulated draws from the latent Gaussian process. We typically do this to characterize the fitted posterior GP (e.g. plotting them). If you want to sample from the likelihood, you will want to add the `pred_noise=True` argument to `sample_posterior_predictive`, which will then draw from the posterior predictive distribution.

To gain inferences on the underlying probabilities, you would draw without `pred_noise` and take the inverse logit of those values.

2 Likes

Thanks for your response. I have successfully managed to take the inverse logit of my ppc samples from the Latent conditional distribution, however `sample_posterior_predictive` does not have the option `pred_noise`? Neither does `pm.gp.Latent.conditional()`. This is only a functionality of `pm.gp.Marginal.conditional()` (and `MarginalSparse`, `MarginalKron`). Am I mis-understanding?