Engineering PyMC3 models into production ML systems

AlexIoannides · May 17, 2021, 6:44am

Hello,

I’ve put together a tutorial on how I think you could engineer PyMC3 models into production systems - e.g. as web APIs with various endpoints, depending on whether you want a point estimate, highest-density interval or density, etc.

I’ve tried to address what the machine learning lifecycle might look like for a PyMC3 project, how to engineer the service (using FastAPI) and then deploy it to Kubernetes using Bodywork (an open-source ML deployment tool that I contribute to).

The project used in the tutorial is hosted on GitHub,

And you should be able to use this as a template for deploying your own projects.

I’ve been involved in the ML engineering and MLOps side of things for a couple of years now, but I’ve been a PyMC3 fanatic for a lot longer. It struck me, that there’s almost no talk of Bayesian methods and probabilistic programming in these fields, which is odd, given that there aren’t many other pragmatic approaches for tackling uncertainty in prediction (as far as I can see).

Any feedback would be greatly appreciated.

Alex

junpenglao · May 17, 2021, 9:03am

@twiecki

ericmjl · May 17, 2021, 8:57pm

@AlexIoannides thank you for posting here! I had always wondered good API endpoints would be, as they define the language that we use in prod. Looks like you’ve nailed a great starting point:

Point
Interval
Density

These nouns have a natural mapping to the probabilistic output. I think I’ve been blinded by sklearn-style point prediction functions that always return a /predict API.

Thank you for sharing!

twiecki · May 17, 2021, 9:47pm

This is incredible @AlexIoannides and very timely. Thanks for sharing!

In relation to this, we started thinking about better API for saving and loading modules, would be curious to get your thoughts on this: Improve workflow around saving and loading fitted models · Issue #4687 · pymc-devs/pymc3 · GitHub

AlexIoannides · May 18, 2021, 4:58am

Thanks for the encouragement!

AlexIoannides · May 18, 2021, 5:41am

Personally, I didn’t think that pickling and persisting the Model alongside saving the netCDF inference data, was ‘that’ big a deal. It’s certainly no different to the recommended way of persisting PyTorch models (‘models’ and weights persisted separately).

What am I missing?

I’ll consider it further and post my comments on GitHub.

sj_R · May 18, 2021, 2:56pm

Thanks for taking the time to do this!

AlexIoannides · May 18, 2021, 3:19pm

The pleasure is mine - I PyMC3 and it was always going to be top of list for a Bodywork demo.

Topic		Replies	Views
PyMC Labs - A Bayesian consultancy News	3	1066	March 12, 2021
Advance Bayesian Modelling with PyMC3 Sharing	13	5279	January 21, 2022
Looking to hire someone to help me 1 hour daily for a week integrating PyMC with my ML model Questions	1	510	June 7, 2020
Deploying Bayesian Models with PyMC3 Questions	2	1305	May 23, 2020
PyMC3 with scikit-learn API Development	2	3083	January 7, 2018

Engineering PyMC3 models into production ML systems

Related topics