Jupyter idiom for cached results?

rpgoldman · February 18, 2021, 1:40am

Is there a standard idiom for “check for this cached file (inference data), and if it’s there, read it, otherwise run this sampling command”? Seems like this would be very useful for notebooks that use PyMC on models that are expensive to sample.

rpgoldman · February 26, 2021, 7:23pm

I have been hand-writing stuff like this:

idata_file = "myfilename.nc"
if os.path.exists(idata_file):
   idata = az.from_netcdf(idata_file)
else:
  idata = <some expensive computation>
if not os.path.exists(idata_file):
  az.to_netcdf(idata, idata_file)

but it’s not great.

Probably this could be streamlined with a decorator, as long as one was careful to avoid name collisions.

jhrcook · February 28, 2021, 11:59am

I just wrote a function that wraps pm.sample() and some other common functions (e.g. sample_posterior_predictive()) that automatically checks for a cache and loads it if one is available. It’s not ideal, but works well enough at the moment. I’m tempted to turn it into a simple package so I can use it in other projects. It would be great to hear what others do because this is a rather annoying issue that I have to believe many others have dealt with.

ricardoV94 · February 28, 2021, 4:32pm

Would this also be able to detect when a model has changed and not just whether the trace file with a given name already exists or not? That would be really helpful.

jhrcook · March 3, 2021, 4:42pm

For my use case, I didn’t necessarily want automatic cache invalidation, but if you can create some sort of hash of the model, then it should be pretty straightforward. Since there is theano magic at play, it may be worth looking through the attributes of the pm.Model to find some sort of unique identifier.

For example, say we have the following model:

with pm.Model() as model:
    a = pm.Normal("a", 0, 1)
    b = pm.Normal("b", 0, 1)
    mu = a + b
    sigma = pm.HalfNormal("sigma", 2)
    y = pm.Normal("y", mu, sigma)

Then we could use the results from print(model):

          a ~ Normal
          b ~ Normal
sigma_log__ ~ TransformedDistribution
          y ~ Normal
      sigma ~ HalfNormal

Or we could gather the random, deterministic, and observed variables (using model.free_RVs, model.deterministics, model.observed_RVs) to create specific identifier to recognize the model.

OriolAbril · March 4, 2021, 1:11pm

What would be ideal for reproducibility but no idea if it’s possible at all, would be to somehow store the lines of code within the model context in a string, which could then be added to the inferencedata object as an attribute.

Topic		Replies	Views
How to save my trace in the system? Questions	6	6574	April 12, 2021
Saving and loading traces in PYMC4 v5	2	1115	January 21, 2023
AttributeError: module 'pymc' has no attribute 'save_trace' when trying to save trace v5 bug	6	1047	October 7, 2022
Saving model and inference results (traces?) in pyMC4 v5	5	2245	July 18, 2022
Saving/Loading PyMC3 model with InferenceData Questions	2	1427	September 13, 2021

Jupyter idiom for cached results?

Related topics