Let’s discuss what steps would be required to extend the current GP api in some meaningful ways, given the upcomingMatrixNormal distribution. Some possibilities include
Coregion kernels (as in GPflow or GPy, which uses a mixed noise likelihood. Does that have any benefit over MatrixNormal?)
Marginal2D
Multiple curve observations of same GP (works without MatrixNormal, but needs testing)
I personally enjoy plot_gp_dist and would like to see if it could be modified to handle the above points
Other ideas?
These features would help in my research, so this is partly a selfish plea for your help, but I really do think that PyMC3 would benefit from these additions. This post is just meant to spur discussion on these points.
That is a good question. That was an idea, along with Latent2D, raised by @bwengals in this reply. I wasn’t entirely clear on it either, so maybe he can elaborate.
Sorry for my slowness, been a busy week… should clear up soon.
I completely agree that PyMC3 would benefit from these additions. There have already been a few people on discourse asking about stuff that could be helped by this. The whole point is to benefit peoples research and work!
For the reasons @junpenglao said, probably not the best name. How about MarginalKron and LatentKron to emphasize that the implementation relies on a Kronecker structured covariance matrix? I think the implementation of LatentKron should follow Latent nearly exactly, except with appropriate input args and MatrixNormal instead of MvNormal. MarginalKron will be the same when the covariance structure is $K_1 \otimes K_2$. If we want to also have the structure $K_1 \otimes K_2 + \sigma^2 I$, where the noise is diagonal across the whole covariance matrix, then we need to use the eigendecomposition instead of Cholesky (ie Ch 5 here).
I need to think a bit more about a Coregion covariance to say more, but from what I understand I don’t think there should be too much difficulty.
@bwengals that thesis is a good resource. Does theano provide anything like the Kronecker matrix-vector and Kronecker matrix-matrix products, defined as kron_mvprod and kron_mmprod respectively in the thesis?
I assume that it would be nice to allow for an arbitrary number of Kronecker products, right? If so, will we need something more than just a MatrixNormal? Possibly a KroneckerNormal or something that takes advantage of kron_mvprod or kron_mmprod described above when computing the exponential and some determinant tricks for the prefactor.
Unfortunately theano does not. Would have to implement it.
I think it would be useful, though i bet that a vast majority of cases are covered by MatrixNormal. If youre making a spatial model, you just need x and y. Maybe you could have time too? But past that its harder to anticipate usecases. The amount of work to implement the general version might be more than the utility it brings. Im for sticking with MatrixNormal for now, and we can see if theres demand for the general case?
You are probably right, and I am not opposed to sticking with MatrixNormal. I may be in the 0.1% that could benefit from the generic case, so I may dig deeper and see what I can do, but who knows if anything useful for PyMC3 will come of it. For reference, this page implements some nice Kronecker tricks for GPflow and also defines a Kronecker GP class, so some subset of this may be of relevance here.
Oh! nice find. It’s certainly relevant. Ill look more at it. I thought for your case you just needed MatrixNormal? Do you have more stuff coming that would benefit from the generic case?
So in my case I have multiple realizations of GPs, some of which have 2D inputs. The total matrix is then the Kronecker product of the covariance between curves with the Kronecker product of the two input covariances. Using MatrixNormal would work for this if I unpack the y into 1D, but the speed increases from the general case would allow me to cover more of the input space. Who knows, it could be premature optimization on my end!
Are your 2D inputs on a grid? If not then you couldn’t use Kronecker structure anyway. You can just as well do something like $k(x1, x2 ; x1’, x2’) \otimes k(x3, x3’)$, where the x3 input would be indexing your different GP realizations, and the x1, x2 inputs would be the 2D inputs.
I can choose my inputs to be on a grid and can make them as fine or as course as needed. I essentially have access to continuous functions that are very cheap to sample from. Your suggestion would work, it would just involve inverting a larger matrix than is necessary.
I am resurrecting this discussion because I have made some progress on these fronts:
Kronecker functionality is here! If Theano wizards can improve on kron_dot, etc., I’m all ears.
Thus, a KroneckerNormal is now possible and is also here.
MarginalKrontoo, though it requires testing and is not quite finished (currently getting a weird error when jobs != 1 which is AttributeError: Can't pickle local object 'EighGrad.__init__.<locals>.<lambda>').
And Coregion kernel is here, (shamelessly stolen from GPy) but hasn’t been tested.
I don’t see any lambdas, but it looks like the problem only occurs when using eigenvalue decomposition, which happens inside KroneckerNormal. It could be somewhere in this block, but I haven’t managed to fix it yet.
Good catch @ferrine. Eigendecomposition is required to add diagonal noise to the covariance matrix. Could someone submit a PR to Theano to switch to partial or would we have to find a workaround on our end?
I’m really looking forward to seeing this stuff in action. There’ve been several questions that have come up on discourse where this sort of stuff would have helped people out.
@junpenglao that pickling fix replacing the lambda with functools.partial was made by @aseyboldt, I believe.