Model Compilation Avoidance When Reusing A Model

For models where you need to add a node to the graph after an initial sampling like this example:

with pm.Model() as demo:
    N_ = pm.MutableData("N_", N_)
    alpha = pm.Exponential('alpha', 0.1)
    beta = pm.Exponential('beta', 0.1)
    p = pm.Beta('p', alpha, beta, shape=n)
    N = pm.Binomial('N', sample_sizes, p, observed=N_)
    idata = pm.sample()
    
with demo:
    M = pm.Binomial('M', population_sizes, p, shape=n)
    posterior_predictive = pm.sample_posterior_predictive(idata, var_names=['M'])

How would one re-use the model with new data, N_ in this case, without re-compiling the model.

There doesn’t seem to be a way of deleting nodes from the graph, deleting M in the example.

Any help would be greatly appreciated.

PyMC will always recompile the functions needed for sampling even if the model didn’t change so there’s no point in worrying.

However if you want to avoid modifying the original model which could be a good practice you can use pymc.model.fgraph.clone_model — PyMC v5.10.2 documentation

Thank you @ricardoV94. That is what I feared. I have a time series model that I need to reuse ~1200 times for different locations. I am using nutpie that uses numba. Numba has an internal cache that contains everything that it has ever compiled in a process. This is resulting in an LLVM memory issue as the number of iterations grow the available memory is exceeded. Memory grows all time - #7 by frankie4fingers - Community Support - Numba Discussion

And unfortunately only nutpie and the pymc sampler work for my model both numpyro and blackjax give nonsensical results. If you have any suggestions that would be great. Maybe I just need to bring this up to the Numba folks and see if they are willing to allow that internal cache to be cleared during runtime.

nutpie offers a way to reuse a compiled mode with shared variables, you have to use their API directly, and the compiled nutpie pymc model has a with_data method that does what you want

Thank you @ricardoV94. Do you have any suggestions on how I can do this with the fact that I need to do an intermediate sample step and then add a node to the graph? Similar to the example I posted above. Would the clone_model approach circumvent the issue that I am facing?

You can define two separate models the one for sampling with nutpie and the one for posterior predictive.

Posterior predictive sampling doesn’t care which model the trace came from: Out of model predictions with PyMC - PyMC Labs

Neither model needs to be redefined when you change the data, although you will have to use nutpie functionality directly to avoid recompiling.

There is no easy way to avoid recompiling the sample posterior predictive function right now but that should have a relatively smaller footprint (and not suffer from growing cache)

Hey @ricardoV94, I have been doing as you suggested. I am able to use nutpie’s api and get the .with_data() method to work. However, the clone_model is weird. I get wildly different posterior predictive results when I use the clone_model() functionality than when I just add nodes to the model that was sampled. Do the learned parameters transfer over to the cloned model?

@ricardoV94 never mind, I figured it out. I was accessing the variables in the clone model incorrectly. I had to access them like this clone_model[‘variable_name’]. Thank you for all your help!

1 Like