Extending Gaussian Process functionality: Coregion and beyond

jordan-melendez · October 21, 2017, 11:55pm

Let’s discuss what steps would be required to extend the current GP api in some meaningful ways, given the upcoming MatrixNormal distribution. Some possibilities include

Coregion kernels (as in GPflow or GPy, which uses a mixed noise likelihood. Does that have any benefit over MatrixNormal?)
Marginal2D
Multiple curve observations of same GP (works without MatrixNormal, but needs testing)
I personally enjoy plot_gp_dist and would like to see if it could be modified to handle the above points
Other ideas?

These features would help in my research, so this is partly a selfish plea for your help, but I really do think that PyMC3 would benefit from these additions. This post is just meant to spur discussion on these points.

junpenglao · October 22, 2017, 6:44am

These are all great suggestions. One question: what do you mean by Marginal2D? We can already do Marginal2D by unpacking the y (latent) into 1D.

jordan-melendez · October 22, 2017, 4:35pm

That is a good question. That was an idea, along with Latent2D, raised by @bwengals in this reply. I wasn’t entirely clear on it either, so maybe he can elaborate.

bwengals · October 24, 2017, 5:17pm

Sorry for my slowness, been a busy week… should clear up soon.

I completely agree that PyMC3 would benefit from these additions. There have already been a few people on discourse asking about stuff that could be helped by this. The whole point is to benefit peoples research and work!

For the reasons @junpenglao said, probably not the best name. How about MarginalKron and LatentKron to emphasize that the implementation relies on a Kronecker structured covariance matrix? I think the implementation of LatentKron should follow Latent nearly exactly, except with appropriate input args and MatrixNormal instead of MvNormal. MarginalKron will be the same when the covariance structure is $K_1 \otimes K_2$. If we want to also have the structure $K_1 \otimes K_2 + \sigma^2 I$, where the noise is diagonal across the whole covariance matrix, then we need to use the eigendecomposition instead of Cholesky (ie Ch 5 here).

I need to think a bit more about a Coregion covariance to say more, but from what I understand I don’t think there should be too much difficulty.

jordan-melendez · October 26, 2017, 3:30pm

@bwengals that thesis is a good resource. Does theano provide anything like the Kronecker matrix-vector and Kronecker matrix-matrix products, defined as kron_mvprod and kron_mmprod respectively in the thesis?

I assume that it would be nice to allow for an arbitrary number of Kronecker products, right? If so, will we need something more than just a MatrixNormal? Possibly a KroneckerNormal or something that takes advantage of kron_mvprod or kron_mmprod described above when computing the exponential and some determinant tricks for the prefactor.

bwengals · October 26, 2017, 5:14pm

Unfortunately theano does not. Would have to implement it.

I think it would be useful, though i bet that a vast majority of cases are covered by MatrixNormal. If youre making a spatial model, you just need x and y. Maybe you could have time too? But past that its harder to anticipate usecases. The amount of work to implement the general version might be more than the utility it brings. Im for sticking with MatrixNormal for now, and we can see if theres demand for the general case?

jordan-melendez · October 27, 2017, 9:32pm

You are probably right, and I am not opposed to sticking with MatrixNormal. I may be in the 0.1% that could benefit from the generic case, so I may dig deeper and see what I can do, but who knows if anything useful for PyMC3 will come of it. For reference, this page implements some nice Kronecker tricks for GPflow and also defines a Kronecker GP class, so some subset of this may be of relevance here.

bwengals · October 27, 2017, 9:43pm

Oh! nice find. It’s certainly relevant. Ill look more at it. I thought for your case you just needed MatrixNormal? Do you have more stuff coming that would benefit from the generic case?

jordan-melendez · October 27, 2017, 10:01pm

So in my case I have multiple realizations of GPs, some of which have 2D inputs. The total matrix is then the Kronecker product of the covariance between curves with the Kronecker product of the two input covariances. Using MatrixNormal would work for this if I unpack the y into 1D, but the speed increases from the general case would allow me to cover more of the input space. Who knows, it could be premature optimization on my end!

bwengals · October 27, 2017, 10:06pm

Are your 2D inputs on a grid? If not then you couldn’t use Kronecker structure anyway. You can just as well do something like $k(x1, x2 ; x1’, x2’) \otimes k(x3, x3’)$, where the x3 input would be indexing your different GP realizations, and the x1, x2 inputs would be the 2D inputs.

jordan-melendez · October 27, 2017, 10:17pm

I can choose my inputs to be on a grid and can make them as fine or as course as needed. I essentially have access to continuous functions that are very cheap to sample from. Your suggestion would work, it would just involve inverting a larger matrix than is necessary.

jordan-melendez · November 21, 2017, 10:37pm

I am resurrecting this discussion because I have made some progress on these fronts:

Kronecker functionality is here! If Theano wizards can improve on kron_dot, etc., I’m all ears.
Thus, a KroneckerNormal is now possible and is also here.
MarginalKron too, though it requires testing and is not quite finished (currently getting a weird error when jobs != 1 which is AttributeError: Can't pickle local object 'EighGrad.__init__.<locals>.<lambda>').
And Coregion kernel is here, (shamelessly stolen from GPy) but hasn’t been tested.

Tests and further improvements are on their way.

junpenglao · November 22, 2017, 6:14am

That’s usually because you have a lambda function in the code and it can not be pickle for njobs, eg see https://github.com/pymc-devs/pymc3/blob/master/pymc3/gp/gp.py#L640-L642

cc @bwengals

jordan-melendez · November 22, 2017, 12:16pm

I don’t see any lambdas, but it looks like the problem only occurs when using eigenvalue decomposition, which happens inside KroneckerNormal. It could be somewhere in this block, but I haven’t managed to fix it yet.

junpenglao · November 22, 2017, 12:27pm

Yeah that is likely. Submit a PR when you are ready, maybe @aseyboldt or @ferrine know a better way to do that instead of *map

ferrine · November 22, 2017, 12:44pm

Don’t think problem is here. It is here

jordan-melendez · November 22, 2017, 2:37pm

Good catch @ferrine. Eigendecomposition is required to add diagonal noise to the covariance matrix. Could someone submit a PR to Theano to switch to partial or would we have to find a workaround on our end?

junpenglao · November 22, 2017, 4:23pm

What we did before is work out a fix locally and submit a PR to theano, eg see here https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/dist_math.py#L240-L241

jordan-melendez · November 25, 2017, 2:22pm

PR for Kronecker stuff started here.

bwengals · November 25, 2017, 9:58pm

I’m really looking forward to seeing this stuff in action. There’ve been several questions that have come up on discourse where this sort of stuff would have helped people out.

@junpenglao that pickling fix replacing the lambda with functools.partial was made by @aseyboldt, I believe.

Topic		Replies	Views
Need for a review of my GP tutorial Sharing	11	1374	April 4, 2022
Modeling Gaussian process from multiple function observations? Questions	1	887	March 14, 2018
Adding Gaussian Processes in PyMC4 this summer! Development	7	1795	March 19, 2020
[PyMCon Web Series 03b] Gaussian Processes Live Q&A (Mar 22, 2023) PyMCon Web Series gaussian_process	2	499	April 13, 2023
Hierarchical Nearest Neighbor Gaussian Process Questions development	5	1297	November 1, 2019

Extending Gaussian Process functionality: Coregion and beyond

Related topics