GP kernel dependent on latent variables

I have two somewhat different problems that I’d like to approach with a GP prior, but I’m wondering whether the kernels can depend on latent variables or at least uncertain variables instead of just fixed X values. I’m trying to determine if this is doable PyMC3.

The first case: I’d like to use a J~GP prior for some 2D tomography reconstruction, but the physics of the problem suggest that the kernel should depend on the value of a linear transform of J at each given location X=(x, y). Is that even possible?

The second case: I’d like to use y~GP to do a regression on some 2D data X=(loc, phase) with a combination of several non-isotropic kernels to fit different features of the behavior. However, the X data has some uncertainty attached. I understand that under the assumption of the errors in X being propagated to y I could enable high uncertainty in y, but that seems sub-optimal. Essentially I’m looking for a way to use a GP for an error-in-independent-variables style fit.

I think it is likely possible, as long as the latent variables for the kernel could be expressed as some theano function, or the GP kernel could be express as a combinations of a set of standard GP kernels.

Did you have a look at the documentation of new GP module?
http://docs.pymc.io/examples.html#gaussian-processes

Yes, I had a look at both the GP versions, the new one says that Latent.prior accepts an array_like X, but not much beyond that. The docs for the older version are gone, but I don’t remember them being too detailed. I haven’t checked the source yet.

@junpenglao, do you think both the cases are possible, or just the second one?

The new version is much more flexible, so you are safe to dig in :slight_smile:

I do think both cases are possible, but I would need a bit more information. Essentially, if you can write down a model to simulate some realistic fake data, you can fit it. In my experience, in 2D or above, the prior becomes very important.

@smartass101 Case 2 is pretty straightforward. The docs are purposely vague (‘array-like’) so that things like that are allowed. Take a look at this example. The inducing point locations Xu have a prior distribution. You can treat the actual inputs in a similar fashion. For uncertain inputs, you could do something like:

X_mu = np.linspace(0, 10, 100)
X = pm.Normal("X", mu=X_mu, sd=0.05)
# be sure to reshape X into a column vector before passing to GP (if it's 1D)
f = gp.prior("f", X[:,None])

Regarding case 1, this does require more thought, but is possible too. Take a look at the Gibbs and WarpedInput covariance functions in the docs, they were designed for these types of scenarios. The input warping you do or the lengthscale function you define will need to be implemented in Theano. of course.

@bwengals re case 1 (tomography): I had a look at those covariance functions, but they depend only on X, I’d like to find a way to include some y(X) dependence as well. I’m just not sure the GP framework supports such a hierarchy.