Let's introduce ourselves!

Hello I am Hannes,

I did a PhD in deformation analysis of volcanic sources using satellite radar data at the German Research Centre for Geoscience. Now I am doing the same thing just for earthquakes where it makes sense to combine it with seismic data. To bring all these together and do proper error-propagation etc., which is unfortunately not yet state of the art in geosciences, I started to learn Bayesian Inference.

I contributed to pymc3, with the Sequential Monte Carlo sampler and hope to do more in the future.

Which might be interesting for others from the geoscience community that stumble upon this; I developed BEAT (Bayesian Earthquake Analysis Tool https://github.com/hvasbath/beat) on top of pymc3 that basically implemented all the necessary data juggeling and model setup for modeling Earthquakes and Volcano deformation fields.

3 Likes

Hi,

I am Gergely, lecturer in biochemistry at the University of Gothenburg. I work mainly with X-ray crystallography and time-resolved diffraction technique development. I am interested in the role of autonomous vibrations in biochemical systems, but I also enjoy doing traditional structural biology with all the surrounding biophysical techniques. Our published applications of pymc3 is so far limited to crystallography, where we looked at distributions of intensity observations with the aim of getting better structure factor estimates. In addition, we have various plans for (robust) curve fitting using hierarchical models (SAXS, thermophoresis), comparing categorical distributions based on chromatin immunoprecipitation locations, image analysis, text mining of literature etc. etc. It is great to see pymc3 evolving (started following about three years ago), it is a remarkable community effort!

3 Likes

Hi, Iā€™m Dan,
I studied Physics & Philosophy which contained a lot of philosophy of science and was my first exposure to Bayesian probability which I found very persuasive. Iā€™ve published in Data Science journals but since have moved away from academia, and been working contractually applying data science and machine learning in businesses. Iā€™m currently using PyMC3 to do large scale automated multivariate testing/multi-armed bandit solutions across a travel business. I also compete on Kaggle and looking forward to using some Bayesian power in some future competitions now I finally have a handle on applying it in practice.
I doubt I will ever be knowledgable enough to contribute to the core development but already have drafts of a few tutorials/documentation contributions that I wanted but havenā€™t found else where ^^

3 Likes

How have you deployed your models DB?

Hey,

So yep, Iā€™ve deployed the first version. Quickly typed out run through of the system, which runs daily:

  • It takes of a list of MVTs that are currently live on incoming customer traffic in the organisation.
  • It pulls all available data pertaining to that MVT, and munges it appropriately
  • It then feeds it into a PyMC3 model, & samples (NUTS)
    • The model is quite simple, it infers 3 key metrics (e.g. conversion) for each arm of the MVT that are usually what we want to be optimised.
  • The inference part outputs the trace, the probability of each arm being optimal for each metric, and any PyMC3 warning information
  • All of this information then gets sent to our company slack via specific channels that are just used for reporting
    • The operational team that runs the MVTs get the following information for each MVT via the channel: general stats on the data, the probabilities of each arm being optimal for each metric graphically and in text - this also is the optimal traffic shares for each arm (standard bayesian multi-arm bandit stuff), posterior distributions for each metric with the frequentist values overlaid.
    • I have a private channel for more detailed monitoring of the analysis, it gets for each MVT the following: (everything that the ops team get), full trace plots of all the variables in the model, auto-correlation plots, ppcs, and any warnings from PyMC3
  • The ops team then use this information to optimise the traffic going to each arm to minimise ā€œregretā€, and to iterate through MVTs as fast as possible (they see as soon as possible when theyā€™ve reached whatever threshold for ditching an arm etc.)

Infrastructure: airflow job that runs the python script/package on a small EC2 instance. Env managed in conda. Itā€™s all in all very lightweight so far. And reporting directly via Slack meant skipping all the time and overhead of building a UI (though this will have to happen soon as the amount of information is fast becoming unmanageable). Currently running is on 90 MVTs per day with no effort required in running optimisation so far.

Successes:

  • Ops team iterate way faster now (before they just ran all tests for 2 weeks, irrespective of amounts of data coming in).
  • Now optimised for exploration vs exploitation tradeoff.

Pain points/difficulties:

  • Seems like a minor one butā€¦ PyMC3 warning - the only way (I found) to capture them is to capture stderr, which also means if you get an error during the sampling then it will be lostā€¦ Still working on a better solution. Ideas appreciated.
  • Currently rerunning entire inference on all data each time - no ā€œonlineā€ learning/inference or state kept between runs. Still working on how best to implement this/if itā€™s even required (It will be required in the end end game for the project.)
  • (Onboarding non-datascience colleagues into using/trusting it - putting effort into really clean and simply presentation of the results that they really care about (traffic share per arm and probability arm is optimal) and putting that somewhere they can really easily find it/pushing notifications about it :sunglasses: really helped with this. This issue is not PyMC3 specific though of course.)

Future version features:

  • Some of the distributions involved can be really funky and not well modelled by the current version, so been testing using Dirichlet process mixtures for much more accurate inference over more varied dists. POCs looking good so far.
  • Using a proper UI for reporting - will store results of analysis in S3 and just pull to a simple Shiny/Dash
  • Automatically setting the optimal traffic shares for each MVT each time the system runs, skipping the ops team. (Building trust in the system before doing this.)
    • Upping the frequency of system runs to update optimal shares sooner as data comes in.

End game (not near a solution yet):

  • Rather than (the current) hard coding actions and then letting the strategies compete to find which is optimal, shifting to using online reinforcement learning to optimise actions as a function on the entire parameter space for each individual incoming event. Any resources/pointers on this appreciated.

Hope that might have been interesting for some of you. Happy to answer questions. Big thanks to the PyMC3 devs again.

3 Likes

Hey, Iā€™m qiu, I am a Ph.D. student in electric engineering at Hunan University of China, my research direction is reliability of electrical power system and intelligent instrument, I love Bayesian statistics and I hope Pymc3 will help me to graduate, hahaha. I also want to find out how bayes can be used in machine learning,and I am looking for a chance to be a joint Ph.D. student in other countries. Happy to know all of you, haha.

2 Likes

Hey all
I am Paridhi, Iā€™m an undergraduate student in Mathematics and Computing at IIT Guwahati, India. I am a Math enthusiast. And in the previous few semesters, we had many interesting courses in Finance, Probability, Scientific Computing. Iā€™m a computing enthusiast and Iā€™d like to start contributing to PyMC3. It is quite amazing and I was going through a few beginner issues I could tackleā€¦ I found this - Handle warnings better in stats.py #2472
https://github.com/pymc-devs/pymc3/issues/2472

Iā€™d like to contribute to this. But then, I am really new to this so can someone guide me a bit?
PS: I was going through the other introductions. So intimidating :). I am only getting started.

Thanks
Paridhi

Hi Paridhi,
If you have any question you can open a separate post here. Otherwise, you can just fire up a PR and the devs will comment on it.
However, if you are just starting getting familiar with PyMC3, I would suggest you run the example notebooks. Some of them have not been updated for awhile, and you can report or fix any bug you found.
Junpeng

Okay, Thanks! Iā€™ll start with running the example notebooks.

Hello everyone!

I am Agustina, Iā€™m a PhD student in the field of structural bioinformatics. Recently Iā€™ve been accepted as a Google Summer of Code student to work with PyMC3 :slight_smile: Iā€™ll be working on an approximate bayesian computation module for PyMC3. I look forward to contribute and learn from this project!

6 Likes

Welcome to the Pymc3 community, Agustina. Congratulations on being accepted.

Hello everyone, Iā€™m Neeraj!

Iā€™m beginning a Masters program in Statistics at the University of Illinois Urbana-Champaign this August. Iā€™m an aspiring researcher in machine learning, currently seeking inspiration + conviction for a specific research area. Diving into Statistics with a CS background is daunting but resources like ā€œBayesian Methods for Hackersā€ make life tractable!

I love developing software and have dabbled in a variety of languages (incl. Fortran, Javascript, Go), environments (web apps, scientific computing, distributed, shell) and databases (incl. MongoDB, Neo4J). I have completed a few applied ML projects using Tensorflow, Apache Spark and Kafka as well. Oddly enough, Iā€™ve never contributed to open source software.

With pymc3/4, Iā€™m hoping to:

  • learn the ropes of open source development
  • learn Bayesian theory
  • improve research skills by replicating results/algorithms
  • improve programming skills by contributing production-ready code + documentation

Looking forward to learning and contributing!

Twitter: @neeraj_wagh
Email: neeraj.wagh@outlook.com, nwagh2@illinois.edu

5 Likes

Welcome to the team neeraj

1 Like

Hi. I am Cynthia, a PhD CS student. I got my applied mathematics MS degree at the University of Washington, but discovered that I was not that into differential equations which apparently thatā€™s what that degree is all about (oops). I got into CS bc of an internship which showed me how cool machine learning can be. ML requires a lot of stats knowledge so I am reading Probabilistic Programming and Bayesian Methods for Hackers which also helps me learn how to use PyMC3. I also find that Probability for Risk Management by Hasset is also a great reference for those who need a quick review of continuous and discrete distributions.

My goal is to hopefully use PyMC3 and bayesian stats to help me detect anomalies in conversational data between humans and chatbots. Bc such data can change over time, I need to make use of perhaps mixture models and dirichlet processes that are more ā€œadaptiveā€.

3 Likes

Hey, Dmitriy here. I am a masters candidate at Virginia Commonwealth University and have started using PyMC3 when introduced to it by a colleague. I am using the library at work and am excited to engage.

Hi everyone,
Iā€™m Juan MartĆ­n Loyola, a PhD student in computer science at University of San Luis (Argentina).
Last year I started learning Bayesian statistics with a course that used PyMC3. I hope to keep learning in this community. :smile:
I love playing football (or ā€œfĆŗtbolā€ as we call it here in Argentina) with friends, althouh Iā€™m a little bit rusty lately. I also like playing ping-pong and listening to music.

4 Likes