Extended event: gathering PyMC usage information

OriolAbril · October 7, 2023, 2:44pm

Hi!

We have been thinking about gathering information about PyMC usage, and we’d like to have a PyMCon event to structure and advertise this effort. This initial topic is for introducing the idea and creating a space for everyone to join the discussion and help us define the initiative: what information should be gathered? What information do you struggle to share with devs?

Ideally we’d gather a corpus of models from anyone who is willing and able to share, but we’ll also need to analyze those, so we will soon start working on the tool to use for that so that anyone can also run the analysis locally and share only the results with us (i.e. which distributions are used, which sampling methods, which of their defaults have been changes…).

Here are some of our initial ideas on information we’d like to have; please add more on the topic below!

Which distributions are being used and how often?
How big are the models? # of variables being sampled by MCMC? # of observations, how close are we from models that don’t fit in RAM of common computers?
Which sampling functions are more common? Which defaults are most often modified?
What operations are more common with PyMC’s outputs: plotting with ArviZ, saving to disk, converting to NumPy/Pandas objects…

RavinKumar · October 7, 2023, 3:01pm

Questions I have

How many users are using pymc3 vs pymc
What are most common packages also imported
- How many folks use xarray operations, arviz, scipy et
Do people use the default sampler, specify their own, or change sampler arguments
What are the most common prior parameters
- I know this is dependent on the context of the model but just curious for different distributions does anything stick out?

twiecki · October 7, 2023, 3:09pm

Which backend is being used?
Number of divergences?
Total sampling time?
ESS

jessegrabowski · October 7, 2023, 4:01pm

Scientific domains/industries where PyMC is being used
Types of data being studied (purely cross-sectional, purely time series, longitudinal, Geo-spatial…)
Size of datasets being analyzed
Use of coords/named dims in models
Causal identification strategies (if any/applicable)
Repos associated with published papers?
Use of “basic” PyMC vs specialized sub-modules/associated projects: GP, BART, sun-ode, Bambi, (others?)

Topic		Replies	Views
Feature discussion: Grid estimation with PyMC? Development	4	689	May 14, 2020
GSoC 2022 Project Topic Discussion Development	2	547	March 30, 2022
Complaint Monday - What has been bothering you about PyMC? Development development	7	601	June 19, 2023
[PyMCon Web Series 03b] Gaussian Processes Live Q&A (Mar 22, 2023) PyMCon Web Series gaussian_process	2	499	April 13, 2023
Bayesian Data Production Operations version agnostic modeling	2	362	October 30, 2022

Extended event: gathering PyMC usage information

Related topics