Model Compilation Avoidance When Reusing A Model

Dekermanjian · February 23, 2024, 3:29am

For models where you need to add a node to the graph after an initial sampling like this example:

with pm.Model() as demo:
    N_ = pm.MutableData("N_", N_)
    alpha = pm.Exponential('alpha', 0.1)
    beta = pm.Exponential('beta', 0.1)
    p = pm.Beta('p', alpha, beta, shape=n)
    N = pm.Binomial('N', sample_sizes, p, observed=N_)
    idata = pm.sample()
    
with demo:
    M = pm.Binomial('M', population_sizes, p, shape=n)
    posterior_predictive = pm.sample_posterior_predictive(idata, var_names=['M'])

How would one re-use the model with new data, N_ in this case, without re-compiling the model.

There doesn’t seem to be a way of deleting nodes from the graph, deleting M in the example.

Any help would be greatly appreciated.

ricardoV94 · February 23, 2024, 9:02am

PyMC will always recompile the functions needed for sampling even if the model didn’t change so there’s no point in worrying.

However if you want to avoid modifying the original model which could be a good practice you can use pymc.model.fgraph.clone_model — PyMC v5.10.2 documentation

Dekermanjian · February 23, 2024, 2:19pm

Thank you @ricardoV94. That is what I feared. I have a time series model that I need to reuse ~1200 times for different locations. I am using nutpie that uses numba. Numba has an internal cache that contains everything that it has ever compiled in a process. This is resulting in an LLVM memory issue as the number of iterations grow the available memory is exceeded. Memory grows all time - #7 by frankie4fingers - Community Support - Numba Discussion

And unfortunately only nutpie and the pymc sampler work for my model both numpyro and blackjax give nonsensical results. If you have any suggestions that would be great. Maybe I just need to bring this up to the Numba folks and see if they are willing to allow that internal cache to be cleared during runtime.

ricardoV94 · February 23, 2024, 3:18pm

nutpie offers a way to reuse a compiled mode with shared variables, you have to use their API directly, and the compiled nutpie pymc model has a with_data method that does what you want

github.com

pymc-devs/nutpie/blob/d6fbf80b346ff713ffd5244443fa98ec9609fd97/python/nutpie/compile_pymc.py#L67


      
              return self._n_dim
          
          @property
          def shapes(self):
              return self._shapes
          
          @property
          def coords(self):
              return self._coords
          
          def with_data(self, **updates):
              shared_data = self.shared_data.copy()
              user_data = self.user_data.copy()
              for name, new_val in updates.items():
                  if name not in shared_data:
                      raise KeyError(f"Unknown shared variable: {name}")
                  old_val = shared_data[name]
                  new_val = np.asarray(new_val, dtype=old_val.dtype).copy()
                  new_val.flags.writeable = False
                  if old_val.ndim != new_val.ndim:
                      raise ValueError(

Dekermanjian · February 23, 2024, 3:37pm

Thank you @ricardoV94. Do you have any suggestions on how I can do this with the fact that I need to do an intermediate sample step and then add a node to the graph? Similar to the example I posted above. Would the clone_model approach circumvent the issue that I am facing?

ricardoV94 · February 23, 2024, 6:23pm

You can define two separate models the one for sampling with nutpie and the one for posterior predictive.

Posterior predictive sampling doesn’t care which model the trace came from: Out of model predictions with PyMC - PyMC Labs

Neither model needs to be redefined when you change the data, although you will have to use nutpie functionality directly to avoid recompiling.

There is no easy way to avoid recompiling the sample posterior predictive function right now but that should have a relatively smaller footprint (and not suffer from growing cache)

Dekermanjian · February 23, 2024, 7:14pm

Hey @ricardoV94, I have been doing as you suggested. I am able to use nutpie’s api and get the .with_data() method to work. However, the clone_model is weird. I get wildly different posterior predictive results when I use the clone_model() functionality than when I just add nodes to the model that was sampled. Do the learned parameters transfer over to the cloned model?

Dekermanjian · February 23, 2024, 7:48pm

@ricardoV94 never mind, I figured it out. I was accessing the variables in the clone model incorrectly. I had to access them like this clone_model[‘variable_name’]. Thank you for all your help!

Topic		Replies	Views
Best practice for parallel model selection, especially avoidance of recompilation v5	7	57	April 20, 2025
Can I reuse the sampler to speed up my code? sampling	5	519	May 11, 2024
Deleting / replacing RVs Development	10	2658	March 15, 2024
Some questions about PyMC Models in online settings v5	3	448	August 29, 2023
Shared variables not reflecting v5 pytensor	3	37	September 21, 2024

Model Compilation Avoidance When Reusing A Model

Related topics