[PyMCon Web Series 04] Scalable Bayesian Modeling (Mar 28, 2023) (Sandra Yojana Meneses)

[PyMCon Web Series] Scalable Bayesian Modeling

Speaker: Sandra Yojana Meneses

Event type: Live webinar
Date: March 28, 2023
Time: 16:00 UTC
Register for the event on Meetup
Talk Code Repository: On GitHub

Content

Welcome to the fourth event of the PyMCon Web Series! As part of this series, most events will have an async component and a live talk.

Blog Post

Sandra has written a blog post, which is accompanied by a full repository where you can test out the performance of your models. You may find the notebook here which includes both the code and instructions for getting thing up and running on Colab.

Try out your models and share the results in this thread

Recording of the synchronous Q&A

Sandras Interview

Sponsor

We thank our sponsors for supporting PyMC and the PyMCon Web Series. If you would like to sponsor us, contact us for more information.

Mistplay is the #1 Loyalty Program for mobile gamers - with over 20 million users worldwide. Millions of gamers use our platform to discover games, connect with friends, and earn awesome rewards. We are a fast growing profitable company, recently ranked as the 3rd fastest growing technology company in Canada. Our passion to innovation drives our growth across the industry with the development of new apps, powerful ad tech tools, and the recent launch of a publishing division for mobile games.

Mistplay is hiring for a Senior Data Scientist (Remote or Montreal,QC).

5 Likes

Links shared in the Zoom chat

PyMCon Web Series CFP info: https://pymcon.com/cfp/

This event is being recorded. Subscribe to our YouTube: PyMC Developers - YouTube

Follow PyMC on LinkedIn
to stay up to date on our community announcements: https://www.linkedin.com/company/pymc/


We thank our sponsor, Mistplay,
for supporting PyMC and the PyMCon Web Series. If you would like to sponsor us, contact us for more information.

Mistplay is the #1 Loyalty Program for mobile gamers - with over 20 million users worldwide. Millions of gamers use our platform to discover games, connect with friends, and earn awesome rewards. We are a fast growing profitable company, recently ranked as the 3rd fastest growing technology company in Canada. Our passion to innovation drives our growth across the industry with the development of new apps, powerful ad tech tools, and the recent launch of a publishing division for mobile games.

Mistplay is hiring for a Senior Data Scientist 3 (Remote or Montreal,QC).
Mistplay position: Mistplay - Scientifique des données senior // Senior Data Scientist


Sandra’s blogpost:

Martin’s blog:
https://martiningram.github.io/mcmc-comparison/

We have an upcoming event on contributing to PyMC, organized by Data Umbrella: [ONLINE] PyMC Open Source Working Session (Latin America region), Thu, Mar 30, 2023, 10:00 AM | Meetup

Questions in Zoom:

When I try to replicate the tennis.ipynb on a Sagemaker instance, I run into Out Of Memory errors when working with the GPUs implementations of either BlackJAX and NumPyro. The errors come from the XLA library. If I run the CPU implementation, there are no errors but of course it takes longer. Are there any hardware requirements to run this notebook? And are there any code hints / guidelines when working with GPUs and PPLs in order to avoid memory issues?

So is the final conclusion that pymc with blackjax is faster and more scalable option ?

How can we paralallize the sampling process in pymc ? Is that a valid approach for scaling bayesian modelling ?

backjax failed for larger datasets in the first example. Do you know what went wrong?

I understand that libraries that use JAX nicely interact with each other - e.g. there are specialized Gaussian Process libraries, such as tinygp. I understand it’s possible to mix tinygp with numpyro, where the GP is defined with tinygp, but sampled with numpyro. Is the same possible with pyMC / are there plans to enable such interoperability?

How many CPU cores does your notebook use per sampling chain? Also, is the number of cores used per chain same for each library Numpyro, BlackJAX, PyMC? My understanding is that PyMC uses 1 core per sampling chain. I could be wrong though

thanks for the nice presentation. I was able to reproduce the results of the blog using numpyro and blackjax running in GPUs. When I run my own model sampling from a mixture distribution using the pymc sampler it works fine, but when running with blackjax or numpyro samplers it fails. Have you tested the GPU samplers with mixture models? Thanks!

Thanks. Is there a way to use more cores per chain?

When using numpyro and BlackJAX in GPUs, is there a way to “clear” the GPU memory so that it can handle bigger datasets? In other words, as data is being used to update the posterior, remove past data that has been used already.

I have previously attempted to configure pymc to leverage a GPU but ran into complexity trying to appropriately configure the infrastrucutre.
It is perhaps outside the scope of the pymc devs, though a phenominal contribution would be a docker container availalbe on docker-hub ready to work on some AWS EC2 or pre-defined infrastructure.

How many GPUs were used to run the tennis notebook?

From your experience, is pm.sampling.jax.sample_numpyro_nuts faster than numpyro code when working in CPUs only? Also, if I understood correctly, the sample_numpyro_nuts function does not need GPUs to be executed, can you confirm?

1 Like