How save PyMC v5 models?

What is a recommended v5 way to save a PyMC model object?

I looked at pickle but got AttributeError: Can't pickle local object '_make_nice_attr_error.<locals>.fn'.

Maybe dill would work, but I wanted to see what people generally do in the current state.

dill · PyPI

Here’s a tutorial: Using ModelBuilder class for deploying PyMC models — PyMC example gallery

1 Like

The ModelBuilder class is clearly the way to go, but if you’re looking for a quick and dirty solution, I’ve been wrapping my trace and model inside a python dict and saving it as a pickle.

import pickle
import cloudpickle

pickle_filepath = f'path/to/pickle.pkl'
dict_to_save = {'model': model_name,
                'idata': idata,
                'recovery_dict':z_score_recovery_dict,
                }

with open(pickle_filepath , 'wb') as buff:
    cloudpickle.dump(dict_to_save, buff)

Then the load would be :

pickle_filepath = f'path/to/pickle.pkl'
with open(pickle_filepath , 'rb') as buff:
    model_dict = cloudpickle.load(buff)

idata = model_dict['idata']
model = model_dict['model']

with model:
    ppc_logit = pm.sample_posterior_predictive(idata )

I’ve had issues saving NetCDF files on Databricks and as long as you keep the pickle version consistent you should be ok.

1 Like

@twiecki What about the case of model checkpointing? I am working on a compute cluster where I may get pre-empted after a certain amount of time. Is there anyway I can save the model at set intervals with this workflow to be loaded and continue sampling where I left off?

Currently that’s not supported. You could just sample 100 samples each, then save, then continue etc. There’s also GitHub - pymc-devs/mcbackend: A backend for storing MCMC draws. by @michaelosthege to save traces on another machine.

@twiecki That makes sense, I really appreciate your response! I assume the model sampled 200 times is roughly equivalent to a model that has been sampled 100 times saved, loaded and sampled 100 more times