Let's introduce ourselves!

hvasbath · September 25, 2017, 12:39pm

Hello I am Hannes,

I did a PhD in deformation analysis of volcanic sources using satellite radar data at the German Research Centre for Geoscience. Now I am doing the same thing just for earthquakes where it makes sense to combine it with seismic data. To bring all these together and do proper error-propagation etc., which is unfortunately not yet state of the art in geosciences, I started to learn Bayesian Inference.

I contributed to pymc3, with the Sequential Monte Carlo sampler and hope to do more in the future.

Which might be interesting for others from the geoscience community that stumble upon this; I developed BEAT (Bayesian Earthquake Analysis Tool https://github.com/hvasbath/beat) on top of pymc3 that basically implemented all the necessary data juggeling and model setup for modeling Earthquakes and Volcano deformation fields.

gkatona · November 6, 2017, 10:39am

Hi,

I am Gergely, lecturer in biochemistry at the University of Gothenburg. I work mainly with X-ray crystallography and time-resolved diffraction technique development. I am interested in the role of autonomous vibrations in biochemical systems, but I also enjoy doing traditional structural biology with all the surrounding biophysical techniques. Our published applications of pymc3 is so far limited to crystallography, where we looked at distributions of intensity observations with the aim of getting better structure factor estimates. In addition, we have various plans for (robust) curve fitting using hierarchical models (SAXS, thermophoresis), comparing categorical distributions based on chromatin immunoprecipitation locations, image analysis, text mining of literature etc. etc. It is great to see pymc3 evolving (started following about three years ago), it is a remarkable community effort!

DBCerigo · November 8, 2017, 10:15am

Hi, I’m Dan,
I studied Physics & Philosophy which contained a lot of philosophy of science and was my first exposure to Bayesian probability which I found very persuasive. I’ve published in Data Science journals but since have moved away from academia, and been working contractually applying data science and machine learning in businesses. I’m currently using PyMC3 to do large scale automated multivariate testing/multi-armed bandit solutions across a travel business. I also compete on Kaggle and looking forward to using some Bayesian power in some future competitions now I finally have a handle on applying it in practice.
I doubt I will ever be knowledgable enough to contribute to the core development but already have drafts of a few tutorials/documentation contributions that I wanted but haven’t found else where ^^

springcoil · December 5, 2017, 9:09am

How have you deployed your models DB?

DBCerigo · December 7, 2017, 10:55pm

Hey,

So yep, I’ve deployed the first version. Quickly typed out run through of the system, which runs daily:

It takes of a list of MVTs that are currently live on incoming customer traffic in the organisation.
It pulls all available data pertaining to that MVT, and munges it appropriately
It then feeds it into a PyMC3 model, & samples (NUTS)
- The model is quite simple, it infers 3 key metrics (e.g. conversion) for each arm of the MVT that are usually what we want to be optimised.
The inference part outputs the trace, the probability of each arm being optimal for each metric, and any PyMC3 warning information
All of this information then gets sent to our company slack via specific channels that are just used for reporting
- The operational team that runs the MVTs get the following information for each MVT via the channel: general stats on the data, the probabilities of each arm being optimal for each metric graphically and in text - this also is the optimal traffic shares for each arm (standard bayesian multi-arm bandit stuff), posterior distributions for each metric with the frequentist values overlaid.
- I have a private channel for more detailed monitoring of the analysis, it gets for each MVT the following: (everything that the ops team get), full trace plots of all the variables in the model, auto-correlation plots, ppcs, and any warnings from PyMC3
The ops team then use this information to optimise the traffic going to each arm to minimise “regret”, and to iterate through MVTs as fast as possible (they see as soon as possible when they’ve reached whatever threshold for ditching an arm etc.)

Infrastructure: airflow job that runs the python script/package on a small EC2 instance. Env managed in conda. It’s all in all very lightweight so far. And reporting directly via Slack meant skipping all the time and overhead of building a UI (though this will have to happen soon as the amount of information is fast becoming unmanageable). Currently running is on 90 MVTs per day with no effort required in running optimisation so far.

Successes:

Ops team iterate way faster now (before they just ran all tests for 2 weeks, irrespective of amounts of data coming in).
Now optimised for exploration vs exploitation tradeoff.

Pain points/difficulties:

Seems like a minor one but… PyMC3 warning - the only way (I found) to capture them is to capture stderr, which also means if you get an error during the sampling then it will be lost… Still working on a better solution. Ideas appreciated.
Currently rerunning entire inference on all data each time - no “online” learning/inference or state kept between runs. Still working on how best to implement this/if it’s even required (It will be required in the end end game for the project.)
(Onboarding non-datascience colleagues into using/trusting it - putting effort into really clean and simply presentation of the results that they really care about (traffic share per arm and probability arm is optimal) and putting that somewhere they can really easily find it/pushing notifications about it really helped with this. This issue is not PyMC3 specific though of course.)

Future version features:

Some of the distributions involved can be really funky and not well modelled by the current version, so been testing using Dirichlet process mixtures for much more accurate inference over more varied dists. POCs looking good so far.
Using a proper UI for reporting - will store results of analysis in S3 and just pull to a simple Shiny/Dash
Automatically setting the optimal traffic shares for each MVT each time the system runs, skipping the ops team. (Building trust in the system before doing this.)
- Upping the frequency of system runs to update optimal shares sooner as data comes in.

End game (not near a solution yet):

Rather than (the current) hard coding actions and then letting the strategies compete to find which is optimal, shifting to using online reinforcement learning to optimise actions as a function on the entire parameter space for each individual incoming event. Any resources/pointers on this appreciated.

Hope that might have been interesting for some of you. Happy to answer questions. Big thanks to the PyMC3 devs again.

Qiuchumo · December 8, 2017, 3:07am

Hey, I’m qiu, I am a Ph.D. student in electric engineering at Hunan University of China, my research direction is reliability of electrical power system and intelligent instrument, I love Bayesian statistics and I hope Pymc3 will help me to graduate, hahaha. I also want to find out how bayes can be used in machine learning,and I am looking for a chance to be a joint Ph.D. student in other countries. Happy to know all of you, haha.

melodramaqueen · December 17, 2017, 9:42am

Hey all
I am Paridhi, I’m an undergraduate student in Mathematics and Computing at IIT Guwahati, India. I am a Math enthusiast. And in the previous few semesters, we had many interesting courses in Finance, Probability, Scientific Computing. I’m a computing enthusiast and I’d like to start contributing to PyMC3. It is quite amazing and I was going through a few beginner issues I could tackle… I found this - Handle warnings better in stats.py #2472
https://github.com/pymc-devs/pymc3/issues/2472

I’d like to contribute to this. But then, I am really new to this so can someone guide me a bit?
PS: I was going through the other introductions. So intimidating :). I am only getting started.

Thanks
Paridhi

junpenglao · December 17, 2017, 10:28am

Hi Paridhi,
If you have any question you can open a separate post here. Otherwise, you can just fire up a PR and the devs will comment on it.
However, if you are just starting getting familiar with PyMC3, I would suggest you run the example notebooks. Some of them have not been updated for awhile, and you can report or fix any bug you found.
Junpeng

melodramaqueen · December 17, 2017, 11:06am

Okay, Thanks! I’ll start with running the example notebooks.

agustinaarroyuelo · April 26, 2018, 1:12pm

Hello everyone!

I am Agustina, I’m a PhD student in the field of structural bioinformatics. Recently I’ve been accepted as a Google Summer of Code student to work with PyMC3 I’ll be working on an approximate bayesian computation module for PyMC3. I look forward to contribute and learn from this project!

adam · April 27, 2018, 6:54pm

Welcome to the Pymc3 community, Agustina. Congratulations on being accepted.

neerajwagh · May 22, 2018, 3:24pm

Hello everyone, I’m Neeraj!

I’m beginning a Masters program in Statistics at the University of Illinois Urbana-Champaign this August. I’m an aspiring researcher in machine learning, currently seeking inspiration + conviction for a specific research area. Diving into Statistics with a CS background is daunting but resources like “Bayesian Methods for Hackers” make life tractable!

I love developing software and have dabbled in a variety of languages (incl. Fortran, Javascript, Go), environments (web apps, scientific computing, distributed, shell) and databases (incl. MongoDB, Neo4J). I have completed a few applied ML projects using Tensorflow, Apache Spark and Kafka as well. Oddly enough, I’ve never contributed to open source software.

With pymc3/4, I’m hoping to:

learn the ropes of open source development
learn Bayesian theory
improve research skills by replicating results/algorithms
improve programming skills by contributing production-ready code + documentation

Looking forward to learning and contributing!

Twitter: @neeraj_wagh
Email: neeraj.wagh@outlook.com, nwagh2@illinois.edu

adam · May 22, 2018, 4:24pm

Welcome to the team neeraj

cynthiaw2004 · May 24, 2018, 2:54am

Hi. I am Cynthia, a PhD CS student. I got my applied mathematics MS degree at the University of Washington, but discovered that I was not that into differential equations which apparently that’s what that degree is all about (oops). I got into CS bc of an internship which showed me how cool machine learning can be. ML requires a lot of stats knowledge so I am reading Probabilistic Programming and Bayesian Methods for Hackers which also helps me learn how to use PyMC3. I also find that Probability for Risk Management by Hasset is also a great reference for those who need a quick review of continuous and discrete distributions.

My goal is to hopefully use PyMC3 and bayesian stats to help me detect anomalies in conversational data between humans and chatbots. Bc such data can change over time, I need to make use of perhaps mixture models and dirichlet processes that are more “adaptive”.

VoroninDA · December 13, 2018, 6:11pm

Hey, Dmitriy here. I am a masters candidate at Virginia Commonwealth University and have started using PyMC3 when introduced to it by a colleague. I am using the library at work and am excited to engage.

jmloyola · February 19, 2019, 9:16pm

Hi everyone,
I’m Juan Martín Loyola, a PhD student in computer science at University of San Luis (Argentina).
Last year I started learning Bayesian statistics with a course that used PyMC3. I hope to keep learning in this community.
I love playing football (or “fútbol” as we call it here in Argentina) with friends, althouh I’m a little bit rusty lately. I also like playing ping-pong and listening to music.

Topic		Replies	Views
Advance Bayesian Modelling with PyMC3 Sharing	13	5037	January 21, 2022
PyMC Labs - A Bayesian consultancy News	3	942	March 12, 2021
New users - READ ME FIRST	3	1358	November 21, 2017
Introduction to Probabilistic Programming Sharing	0	560	June 9, 2019
Interesting serie for beginners Sharing	4	1604	February 28, 2018

Let's introduce ourselves!

Related Topics