Production grade prediction

How people generally transfer their models/make production live predictions after fitting Pymc3 models? Do they use different package for that? Plus to that, how would you make online updates to your model in that case (i.e. online learning)?

I know that main value of bayesian tasks is inference and better insights into models on small/medium datasets, however, results where integration over parameter distribution gives superior prediction results http://twiecki.github.io/blog/2017/03/14/random-walk-deep-net/, actually makes you willing to try this instead of neural nets packages that are by default suited for this online updating tasks

2 Likes

There isn’t too much info available on that, alas. Nicole has done some nice work in creating a sklearn wrapper for PyMC3 models, that’s probably a good starting point for productizing: https://github.com/parsing-science/ps-toolkit/blob/master/ps_toolkit/pymc3_models/HLM.py. Then there is also sampled by @colcarroll https://github.com/ColCarroll/sampled

FWIW, I haven’t really found a good, convenient way to do online learning with PyMC3.

2 Likes

I keep meaning to try doing something with the Interpolated distribution to get something working online, though it feels like that might still be too computationally intensive to do (correctly) on the fly.

I keep meaning to build a sample PyMC3 web app, just to work out how much work needs to be done to make that sort of thing easy-ish. Perhaps that would be followed up with a more comprehensive dash project.

Current plan for sampled is to use it to prebuild models and distribute them (note: you can do this with base PyMC3 as well). This lets you isolate the modeling step from the training step.
Something like

from models import naive_bayes
import pymc3 as pm

corpus, labels = load_training_data()

with naive_bayes(corpus=corpus, labels=labels):
    trace = pm.sample()

I suggest using variational approx for that task if model is large or use pm.Empirical to wrap a trace. This way you can build pure theano graph for interested expression wrt inputs and save it properly

I keep meaning to dig into this too with a simple dash app or something just to see how it works.

Maybe at work I’ll hack together something.

I tried to build an API - https://gist.github.com/springcoil/b8ef6d073349b9102ec49845b757f6b4 with Sampled from @colcarroll Unfortunately I couldn’t quite get this to work. Does anyone else have any insight into making this work? My web API knowledge isn’t great :slight_smile:

I just pushed a small Flask app I’ve been working on – first time trying the cookiecutter app to set something up, but it seems pretty plug and play – see the README for two lines to get it running locally. It exposes a bit of an API, and also creates a UI to interact with it and visualize the results using d3js. I’ll try to add more bells and whistles later :smiley:

1 Like

Hey @twiecki , I’m getting a 404 from this link: https://github.com/parsing-science/ps-toolkit/blob/master/ps_toolkit/pymc3_models/HLM.py

Anyone have an updated link?

P.S. Also, if anyone has additional examples on putting a pymc3 model into production - ideally with the ability to periodically (scheduled or event-triggered) update posteriors with new observations - I’d be very grateful. Thanks!

Hey @dunovank, good to see you here!

We’re working with several clients who run models in production but there isn’t really a great resource. A replacement for that toolbox (which is also abandonware) can be found here: GitHub - pymc-learn/pymc-learn: pymc-learn: Practical probabilistic machine learning in Python

Then there is also Automating daily runs for rt.live’s COVID-19 data using Airflow & ECS | by Mike Krieger | Medium

Feel free to contact me directly if you want to discuss more.

2 Likes

Great, thanks @twiecki !

Basically, I have a docker pod that includes some representation learning and vector indexing services as well as a really simple binary classifier (like super simple, single predictor, logistic regression). Currently I’m just using sklearn for the classifier but the data I’m working with has a natural hierarchical structure that would benefit from partial pooling approach.

Anyway, I will most likely reach out directly at some point in the coming weeks - would be great to catch up and get any advice on deploying pymc3.

1 Like