We have been thinking about gathering information about PyMC usage, and we’d like to have a PyMCon event to structure and advertise this effort. This initial topic is for introducing the idea and creating a space for everyone to join the discussion and help us define the initiative: what information should be gathered? What information do you struggle to share with devs?
Ideally we’d gather a corpus of models from anyone who is willing and able to share, but we’ll also need to analyze those, so we will soon start working on the tool to use for that so that anyone can also run the analysis locally and share only the results with us (i.e. which distributions are used, which sampling methods, which of their defaults have been changes…).
Here are some of our initial ideas on information we’d like to have; please add more on the topic below!
- Which distributions are being used and how often?
- How big are the models? # of variables being sampled by MCMC? # of observations, how close are we from models that don’t fit in RAM of common computers?
- Which sampling functions are more common? Which defaults are most often modified?
- What operations are more common with PyMC’s outputs: plotting with ArviZ, saving to disk, converting to NumPy/Pandas objects…