How save PyMC v5 models?

What is a recommended v5 way to save a PyMC model object?

I looked at pickle but got AttributeError: Can't pickle local object '_make_nice_attr_error.<locals>.fn'.

Maybe dill would work, but I wanted to see what people generally do in the current state.

dill · PyPI

Here’s a tutorial: Using ModelBuilder class for deploying PyMC models — PyMC example gallery

1 Like

The ModelBuilder class is clearly the way to go, but if you’re looking for a quick and dirty solution, I’ve been wrapping my trace and model inside a python dict and saving it as a pickle.

import pickle
import cloudpickle

pickle_filepath = f'path/to/pickle.pkl'
dict_to_save = {'model': model_name,
                'idata': idata,
                'recovery_dict':z_score_recovery_dict,
                }

with open(pickle_filepath , 'wb') as buff:
    cloudpickle.dump(dict_to_save, buff)

Then the load would be :

pickle_filepath = f'path/to/pickle.pkl'
with open(pickle_filepath , 'rb') as buff:
    model_dict = cloudpickle.load(buff)

idata = model_dict['idata']
model = model_dict['model']

with model:
    ppc_logit = pm.sample_posterior_predictive(idata )

I’ve had issues saving NetCDF files on Databricks and as long as you keep the pickle version consistent you should be ok.

2 Likes

@twiecki What about the case of model checkpointing? I am working on a compute cluster where I may get pre-empted after a certain amount of time. Is there anyway I can save the model at set intervals with this workflow to be loaded and continue sampling where I left off?

Currently that’s not supported. You could just sample 100 samples each, then save, then continue etc. There’s also GitHub - pymc-devs/mcbackend: A backend for storing MCMC draws. by @michaelosthege to save traces on another machine.

@twiecki That makes sense, I really appreciate your response! I assume the model sampled 200 times is roughly equivalent to a model that has been sampled 100 times saved, loaded and sampled 100 more times

Hello Kraftfaust

We also ran into file size constrains in databricks while saving the .nc files.
As the Tread is 2 years old, have you found a better solution to save and load the model to mlflow in databricks?

Best wishes