Production grade prediction

gyarnykh · August 20, 2017, 4:53pm

How people generally transfer their models/make production live predictions after fitting Pymc3 models? Do they use different package for that? Plus to that, how would you make online updates to your model in that case (i.e. online learning)?

I know that main value of bayesian tasks is inference and better insights into models on small/medium datasets, however, results where integration over parameter distribution gives superior prediction results http://twiecki.github.io/blog/2017/03/14/random-walk-deep-net/, actually makes you willing to try this instead of neural nets packages that are by default suited for this online updating tasks

twiecki · August 21, 2017, 11:13am

There isn’t too much info available on that, alas. Nicole has done some nice work in creating a sklearn wrapper for PyMC3 models, that’s probably a good starting point for productizing: https://github.com/parsing-science/ps-toolkit/blob/master/ps_toolkit/pymc3_models/HLM.py. Then there is also sampled by @colcarroll https://github.com/ColCarroll/sampled

AustinRochford · August 21, 2017, 1:42pm

FWIW, I haven’t really found a good, convenient way to do online learning with PyMC3.

colcarroll · August 24, 2017, 4:33pm

I keep meaning to try doing something with the Interpolated distribution to get something working online, though it feels like that might still be too computationally intensive to do (correctly) on the fly.

I keep meaning to build a sample PyMC3 web app, just to work out how much work needs to be done to make that sort of thing easy-ish. Perhaps that would be followed up with a more comprehensive dash project.

Current plan for sampled is to use it to prebuild models and distribute them (note: you can do this with base PyMC3 as well). This lets you isolate the modeling step from the training step.
Something like

from models import naive_bayes
import pymc3 as pm

corpus, labels = load_training_data()

with naive_bayes(corpus=corpus, labels=labels):
    trace = pm.sample()

ferrine · August 26, 2017, 4:41pm

I suggest using variational approx for that task if model is large or use pm.Empirical to wrap a trace. This way you can build pure theano graph for interested expression wrt inputs and save it properly

springcoil · September 16, 2017, 9:59pm

I keep meaning to dig into this too with a simple dash app or something just to see how it works.

Maybe at work I’ll hack together something.

springcoil · September 23, 2017, 2:56pm

I tried to build an API - https://gist.github.com/springcoil/b8ef6d073349b9102ec49845b757f6b4 with Sampled from @colcarroll Unfortunately I couldn’t quite get this to work. Does anyone else have any insight into making this work? My web API knowledge isn’t great

colcarroll · September 24, 2017, 3:27pm

I just pushed a small Flask app I’ve been working on – first time trying the cookiecutter app to set something up, but it seems pretty plug and play – see the README for two lines to get it running locally. It exposes a bit of an API, and also creates a UI to interact with it and visualize the results using d3js. I’ll try to add more bells and whistles later

dunovank · March 29, 2021, 4:16pm

Hey @twiecki , I’m getting a 404 from this link: https://github.com/parsing-science/ps-toolkit/blob/master/ps_toolkit/pymc3_models/HLM.py

Anyone have an updated link?

P.S. Also, if anyone has additional examples on putting a pymc3 model into production - ideally with the ability to periodically (scheduled or event-triggered) update posteriors with new observations - I’d be very grateful. Thanks!

twiecki · March 30, 2021, 8:50pm

Hey @dunovank, good to see you here!

We’re working with several clients who run models in production but there isn’t really a great resource. A replacement for that toolbox (which is also abandonware) can be found here: GitHub - pymc-learn/pymc-learn: pymc-learn: Practical probabilistic machine learning in Python

Then there is also Automating daily runs for rt.live’s COVID-19 data using Airflow & ECS | by Mike Krieger | Medium

Feel free to contact me directly if you want to discuss more.

dunovank · March 30, 2021, 10:35pm

Great, thanks @twiecki !

Basically, I have a docker pod that includes some representation learning and vector indexing services as well as a really simple binary classifier (like super simple, single predictor, logistic regression). Currently I’m just using sklearn for the classifier but the data I’m working with has a natural hierarchical structure that would benefit from partial pooling approach.

Anyway, I will most likely reach out directly at some point in the coming weeks - would be great to catch up and get any advice on deploying pymc3.

Topic		Replies	Views
Deploying Bayesian Models with PyMC3 Questions	2	1314	May 23, 2020
How to get "on-line" prediction estimates for data over time? Questions	6	771	May 19, 2020
Engineering PyMC3 models into production ML systems Sharing	7	1499	May 18, 2021
Deploying a model? Questions theano	1	1118	January 11, 2019
Are you using PyMC in production? Questions	0	341	October 28, 2021

Production grade prediction

Related topics