[PyMCon Web Series 02] An introduction to multi-output Gaussian processes using PyMC (Feb 21, 2023) (Danh Phan)

RavinKumar · February 14, 2023, 4:56am

An Introduction to Multi-Output Gaussian Processes using PyMC

Speaker: Danh Phan

Event type: Live webinar
Date: Feb 21st 2023 (subscribe here for email updates)
Time: 22:00 UTC
Register for the event on Meetup to get the Zoom link
Talk Code Repository: On GitHub
Web App: Interest rates prediction for US, AU and UK

NOTE: The event is recorded. Subscribe to the PyMC YouTube for notifications.

Sponsor

We thank our sponsors for supporting PyMC and the PyMCon Web Series. If you would like to sponsor us, contact us for more information.

Mistplay is the #1 Loyalty Program for mobile gamers - with over 20 million users worldwide. Millions of gamers use our platform to discover games, connect with friends, and earn awesome rewards. We are a fast growing profitable company, recently ranked as the 3rd fastest growing technology company in Canada. Our passion to innovation drives our growth across the industry with the development of new apps, powerful ad tech tools, and the recent launch of a publishing division for mobile games.

Mistplay is hiring for a Senior Data Scientist (Remote or Montreal,QC).

Content

Video: Interview with Danh Phan (7 minutes)

Video: Intro to Multi-Output Gaussian Processes Using PyMC

Welcome to the second event of the PyMCon Web Series! As part of this series, most events will have an async component and a live talk.

In this case, Danh, as part of the async component, prepared a full repository for the community to engage in before the talk. It includes multiple colabs, and pdf slide deck

Take a look before the talk to share your questions below and be prepared for the discussion and post

Abstract of the talk

Multi-output Gaussian processes have recently gained strong attention from researchers and have become an active research topic in machine learning’s multi-task learning. The advantage of multi-output Gaussian processes is their capacity to simultaneously learn and infer many outputs which have a similar source of uncertainty from inputs.

This talk presents to audiences how to build multi-output Gaussian processes in PyMC. It first introduces the concept of Gaussian processes (GPs) and multi-output GPs and how they can address real problems in several domains. It then shows how to implement multi-output GPs models such as the intrinsic coregionalization model (ICM) and the linear model of coregionalization (LCM) in Python using PyMC with real-world datasets.

The talk aims to get users quickly up and performing GPs, especially multi-output GPs using PyMC. Several examples with time-series datasets are used to illustrate different GPs features. This presentation will allow users to leverage GPs to analyze their data effectively.

RavinKumar · February 20, 2023, 4:12pm

Here’s a link to Danhs interview ahead of the event. Looking forward to seeing you all at the event

nfultz · February 21, 2023, 10:53pm

question about how important the joint normality assumption is:

I have a data set where I feel good about marginal distributions of y_1 y_2 y_3 being gaussian, can model them separately just fine, but not sure about them being jointly normal (similar to teardrop shape on multivariate analysis - Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian? - Cross Validated ) due to some heteroskedacity or something,

Examples of bivariate distribution with standard normal marginals.

can choosing a good kernel on cross-covariance deal with something like this or would I need some other strategy in general for a combined model?

tjburch · February 21, 2023, 10:54pm

~~@nfultz It’s not GPs but this is something copulas are built to handle~~

edit: just clicked your link to find that it’s explicitly from copulas. Disregard.

tjburch · February 21, 2023, 10:58pm

I’m curious on the baseball example what the benefits from a business standpoint come with modeling this at all?

Looking at it, it seems like a rolling average would probably do fine to smooth the data. Looking at the right side of the plots, the GPs are not going to extrapolate well. You can fill in between games, but I’m not sure that’s useful. So I’m curious to what value the process of modeling here actually brings.

nrakocz · February 21, 2023, 11:02pm

For the baseball example. Is there a benefit to using multi-output vs. single-output?

nfultz · February 21, 2023, 11:02pm

Was thinking there was a way to use the Xs to get a fan or teardrop shape on Ys. Not sure if it would work.

nrakocz · February 21, 2023, 11:14pm

Thank you @fonnesbeck.
So, is it ok infer that multi-output has a similar effect on interpolation as multi-level models on inference?
i.e It helps in areas where some groups (or outputs in the GP case) have fewer data

fonnesbeck · February 21, 2023, 11:29pm

As presented, perhaps not much, but it can be helpful when you have players with varying amounts of data. If we know there are correlations among the players, then we can get better inferences for players that have fewer games.

You never want to extrapolate beyond the lengthscale of the GP, that’s true. If you were interested in extrapolation, you would either have another function to use as the GP’s mean function, or have an additive kernel that includes a component with a longer lengthscale.

nfultz · February 22, 2023, 1:04am

I think my solution is lurking in this slide somewhere:

Can we modify \sigma^2 to vary with location of x_i and not be fixed? And maybe allow correlations between y_{1j} and y_{2j} instead of the 0s on off-diagonal blocks?

Topic		Replies	Views
Tomorrow: "An Introduction to Multi-Output Gaussian Processes using PyMC" by Dan Phan News pymcon	4	577	March 12, 2023
Multi-output gaussian processes Questions	13	5463	October 22, 2017
Need for a review of my GP tutorial Sharing	11	1407	April 4, 2022
GSoC 2022 Multi-outputs GPs Development	11	966	April 18, 2022
How to build a Multi-Output Gaussian Process for a Surrogate Model v5	0	27	July 25, 2025

[PyMCon Web Series 02] An introduction to multi-output Gaussian processes using PyMC (Feb 21, 2023) (Danh Phan)

An Introduction to Multi-Output Gaussian Processes using PyMC

Sponsor

Content

Related topics