Serializing models

Hey guys,
in my current project my goal is to create a variety of bayesian models (ranging from simple spline models to GPs) in pymc for my data and to compare the results eventually. As I want to keep track of all the different runs of different models I have already executed, I am looking for an infrastructure that is able to track the following points:

  • Prior model before inference
  • Model parameters like mean and std of prior distributions
  • Dataframe that was used for the run
  • Resulting InferenceData object

Afterwards, the reinitiation of runs should be possible. I had a quick look into Mlflow which is absolutely capable of the last 3 points but doesn’t support pymc models which makes the first point a bit difficult. I am hoping to find a kind of all-in-one solution.

Any suggestions? Thanks in advance!

Hi!

What do you mean “does not support pymc models”? Can you not simply wrap it in mlflow.pyfunc.PythonModel? Or just store it as an artifact (this only makes sense in case you don’t need mlflow’s serving or predict functionality, which won’t work without an “actual model”)?

https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html