PyMC3 with scikit-learn API


#1

I was listening to this talk from Nicole Carlson about how she implemented the scikit-learn API with PyMC3 models. I though it was really interesting. She also provided some code on github on how she implemented it: https://github.com/parsing-science/ps-toolkit

As someone coming from the engineering side of things when learning PyMC3 for fun I found it difficult finding all the pieces of information I needed to do things that are really easy to do with scikit-learn (fitting a model, extracting parameters information and then predicting with new data). The tutorials for PyMC3 are really complete on how to specify your model, fit it, finding parameters values and understanding how bayesian modeling works, but it is harder to find how to test with unseen data, save your model for later re-use etc. Basically making your model production ready, making it reusable. Going a step further than the jupyter notebook.

I think PyMC3 would really benefit from a closer integration with the scikit-learn API so that less experienced people like me could use the package at a higher level and worry more just about defining the bayesian model and have an easy interface (fit, predict methods) to play with would probably help just getting started. You probably don`t want to hide the complexity of PyMC3 behind this, it would simply be a way of making PyMC3 friendlier for beginner and to the ML community in general. Maybe have some basic model already bundled into their own class like a LinearRegression, LogisticRegression, Hiearchical Linear Regression etc. All with a simple fit and predict method. You could even have a separate repository with some re-usable bayesian models recipes.


#2

Also check out Nicole’s new package that bring scikit-learn API to pymc3: https://github.com/parsing-science/pymc3_models


#3

That’s a great idea, I’m glad that Nicole moved forward and started this project. I’m currently working on the Gaussian Mixture Models on my spare time and had a question: pyMC3 already has a GaussianMixture and Mixtureclass. I’m currently leaning towards reimplementing the marginalized model in the new package to have minimal coupling with pyMC3. Any argument against?