Deploying Bayesian Models with PyMC3

As a disclaimer I’m new to Bayesian statistics having just read McElreath’s Statistical Rethinking. (1) What sort of workflow is needed to deploy a bayesian model into production, (2) how does it differ from deploying a typical ML model, and (3) how can you accomplish these steps with PyMC3? (4) Any good resources out there for this? I know some of these are broad questions and this post is filled with other questions… I tried to number them so it’s easier for reference.

(5) I would guess if you deploy a bayesian model into production you’d need to run trace checks each time the model is retrained to check for convergence issues, is that true and is there a way to automate trace checking (I’m guessing someone would manually have to check each time)? (1 or 6) Are any other steps similar to (5) necessary in the workflow deploying a bayesian model that might not be needed in a typical ML production model?

(7) Can online learning be implemented with a Bayesian Hierarchical Regression in PyMC3? Why/why not?

(8) Are there frequentist alternatives to deploying a hierarchical regression (by hierarchical I mean with nested categorical data) as a Bayesian model? What are the advantages/disadvantages? (I’ve worked with lme4 before, but I’d say thats not very feasible in a production environment).

(9) Is variational inference the best way to scale a bayesian model in production to big data?

This might not be an appropriate way of asking so many questions on a forum, and it might not be PyMC3 focused enough, so apologies if that’s the case; I’m new to this field and I’ve had trouble finding resources.

I’m not an expert on deployment, but I can try to answer a few of the others:

(7) Yes—one of the places the Bayesian approach excels is in online learning situations. This is because of the natural translation from “prior / observations” to “trained model / new data.” You’d train a Bayesian model (hierarchical regression or any other) on whatever data is available, then as new data came in you’d treat the old posterior as the new prior.

I read Bayesian Product Ranking at Wayfair earlier this year, and it’s a great example. They use a slightly more sophisticated approach (keeping most of the posterior but intentionally forgetting some information, through some autoregressive process that I don’t fully understand), but the core idea is the same.

(9) Variational inference is definitely one way, but in my experience it depends pretty heavily on your use case and what “big” data is. Another idea might be to reformulate your problem into several smaller models that can train at different levels of aggregation, though this is very problem-specific and by no means one-size-fits-all.


Thanks I appreciate the feedback - that Wayfair article is awesome by the way, thanks for sharing it.

In terms of reformulating the problem into several smaller models, I’m hoping to understand how to deploy the best possible model - so understanding how to deploy one model with partial pooling is probably preferred to several smaller ones, but this is all theoretical anyway