Need for a review of my GP tutorial

essicolo · March 14, 2022, 8:39pm

Hi,

I created a multi-output Gaussian process tutorial with GeoPandas PyMC3. There might be some points to correct or adjust. Any comment before I share it to the world?

ckrapu · March 14, 2022, 9:19pm

I think that’s a really nice integrated example. The only thing that I would like to see added is a map showing the predictive variance after fitting the GP. That way, you can show how the uncertainty is low close to the observed data locations and grows as we move away from those points.

essicolo · March 15, 2022, 12:45pm

Good suggestion. I pushed the changes. Thanks!

DanhPhan · March 16, 2022, 8:00am

Hi @essicolo, thank you for sharing.

This is a very nice example of the intrinsic model of coregionalization (ICM) using Pymc.

In the model, I am not sure if * is a Kronecker product (⊗) ?

    ## Combined covariance
    cov_f = coreg * cov_feature # (B * K) or (B⊗K)?

There is a pymc.math.kron_dot function in pymc, but I am not sure it will work for this case.

essicolo · March 16, 2022, 12:53pm

Hi @DanhPhan,

Thanks for the review. I tried to modify the code either with pymc.math.kron_dot and pymc.gp.cov.Kron. The former needs a parameter m and returns a numpy array, so I guess, for a Kronecker product of kernels, the later should be used. But using it led to IndexError: index 1 is out of bounds for axis 1 with size 0, probably an error related to the size of the kernel inflated by the Kronecker product?

DanhPhan · March 20, 2022, 1:59am

Hi @essicolo, it seems that the current pymc.gp.cov.Coregion does not support multi-outputs yet. The current implementation does not have the output_dim parameter compared to GPy Coregionalize

As the multitasks GP seems not fully supported in pymc yet, you may also try other libraries like Gpy, GPflow (tensorflow backend), GPytorch (pytorch backend), or recently mogptk (pytorch backend).

Here is my simple implementation using GPytorch with your data set. Although, I don’t like it as we need to write more boilerplate codes.

essicolo · March 31, 2022, 2:34pm

Hi, Thanks for the examples! Since multi-output is not yet built in PyMC, I rebooted the tutorial with GPFlow (I’m not so good with Pytorch).

BioGoertz · April 1, 2022, 10:28am

@essicolo and @DanhPhan, you can indeed use PyMC for multi-output GPs. Basically you just need to add an extra input indicating which output you’re observing/predicting. See the small example here: Coregionalization model for two separable multidimensional Gaussian Process - #4 by bwengals. I’ve built it into my package for building GPs from tabular data, you can see it in action here and the code here.

DanhPhan · April 3, 2022, 7:26am

Hi @BioGoertz, which multi-output GPs methods currently support in your package?

The multi-output GPs seems not fully supported in pymc yet, for example, we are not able to mix different kernels (like ExpQuad and Matern32) together in the Coregion regression model.

BioGoertz · April 3, 2022, 11:13am

My package is just using Pymc3 as the only backend (for now, hope to expand it to GPFlow), so it can’t really do anything that Pymc3 can’t. I just implemented the bwengels example there (linear model of coregionalization, I believe?), but you’re right that it’s not immediately obvious how/whether you could use structurally distinct but correlated kernels for the two outputs.

BioGoertz · April 3, 2022, 11:59am

@essicolo For your priors on lengthscale and variance, I might suggest using zero-avoiding priors (Gamma or InvGamma) as recommended here: Robust Gaussian Processes in Stan. That should avoid some of the issues you saw with HalfCauchy. You can also take a principled approach to defining them by scaling them to the observed quantiles of your input and output data.

As for the “meaning” of the W matrix, I’d suggest watching this: Multi Output Gaussian Processes, Mauricio Alvarez - YouTube and/or reading this: http://gpss.cc/gpss17/slides/multipleOutputGPs.pdf. As far as I understand it (and I’m no expert), coregionalization uses a single latent GP and considers the different outputs to be linear combinations of different samples from this GP. The rank of W is essentially the number of distinct samples that are combined in this way, so (I think) higher rank → more samples → more idiosyncratic behavior between the outputs.

Also, I’d recommend using a different color palette for the uncertainty vs the concentration.

It’s a great notebook!

essicolo · April 4, 2022, 5:42pm

@BioGoertz Thanks for the tips and links. I will update everything as soon as I have some time, and maybe try running the model with Gumbi!

Topic		Replies	Views
GSoC 2022 Multi-outputs GPs Development	11	961	April 18, 2022
[PyMCon Web Series 02] An introduction to multi-output Gaussian processes using PyMC (Feb 21, 2023) (Danh Phan) PyMCon Web Series gaussian_process	11	1862	March 6, 2023
Multi-output gaussian processes Questions	13	5454	October 22, 2017
What are the applications of Multi-outputs Gaussian processes (MOGPs) in your work? Development gaussian_process	3	676	March 20, 2023
Coregionalization model for two separable multidimensional Gaussian Process Questions	3	2276	February 2, 2019

Need for a review of my GP tutorial

Related topics