Best practice for parallel model selection, especially avoidance of recompilation

lucidfrontier45 · April 19, 2025, 11:02am

I am trying compare multiple models which uses different feature extraction procedure or different prior parameters with WAIC. My approach is as follows.

create a function which takes feature extraction and prior parameters, runs HMC and returns WAIC value (with arviz)
generate a list of feature extraction and prior parameter sets.
evaluate all sets of 2 with the function of 1 in parallel with joblib.

This approach works except for one problem: the long compilation time especially when I use nutpie+numba as the sampler.
I think recompilation is unavoidable if I use different feature extraction since the model structure itself is different. However I hope I can reuse compiled model when only prior parameter values are different.

Is there any recommended approach for this kind of issue?

Thank you.

Dekermanjian · April 19, 2025, 11:22am

Hey, @lucidfrontier45!

I am not sure if you can avoid recompilation if you are changing your priors. Hopefully someone else can chime in on that specific part.

I do know that if you have a model that isn’t changing, then you can avoid recompilation if you are using the same model on multiple datasets.

To do that, you have to use nutpie’s native API.

For example, if you have a model named model coded up how you typically code up a PyMC model using a context manager. You would then need to compile the model like this:

compiled_pymc_model = nutpie.compile_pymc_model(model) and then you can sample using:
idata = nutpie.sample(compiled_pymc_model)

Then when you want to update the data without recompiling the model you can do:

updated_pymc_model = compiled_pymc_model.with_data(**dictionary_object_with_your_new_data)

and then you can sample again like above using the newly updated_pymc_model object.

EDIT:
Also, just curious, how long of a compilation time are we talking about here? It really shouldn’t take more than a few seconds to compile. Do you have any loops inside of your model?

lucidfrontier45 · April 19, 2025, 11:54am

@Dekermanjian

Hi. Thak you for sharing your approach.
Yes, I have tried it and I confirmed it works if we just change hyperparameters of the prior.

However this compiled model is not picklable and I couldn’t use it with joblib.Parallel. Perhapse I can try it with threadpool in the latest non-GIL Python 3.13.

On compilation time, for example this simple NB regression model took about 40 seconds to be compiled with nutpie+numba in google colab environment.

Dekermanjian · April 19, 2025, 12:11pm

Hey @lucidfrontier45, I am not sure what kind of hardware the colab is giving you but locally the model is compiling in 10 seconds on my laptop. I did confirm that it also takes me 40 seconds to compile on colab. If you run it locally, what is your compilation time?

lucidfrontier45 · April 19, 2025, 12:27pm

@Dekermanjian

With Ryzen 5700X, the compilation time was about 10 seconds. But with Ryzen 5500U, it was about 30 seconds. In both cases the sampling times were about 1-2 seconds. So most of the time was spent on compiling model rather than actual sampling.

One more strange thing is that if you change model’s output dimension from (N, ) to (N, 1), i.e. adding one virtual dimension, the compilation time doubled.

Dekermanjian · April 19, 2025, 1:03pm

Hey @lucidfrontier45,

There is a way to speed up compilation time, however, in most situations you want to avoid doing this because it trades compilation time with sampling time.

EDIT

~~That being said, at least on my machine, the total time is faster when I do the switch in compilation mode. Probably because the model is so simple.~~

Nevermind, I spoke too soon. The faster compilation time was probably because of caching.

You can DISREGARD the following.

You can try changing the pytensor compilation mode like this:

with pm.Model() as model:
  with pytensor.config.change_flags(mode='FAST_COMPILE'):
    # data
    X_ = pm.Data("X", X)
    y_ = pm.Data("y", y)

    # Prior: P(w|α)
    w_ = pm.Normal("w", mu=0.0, sigma=5.0, shape=(len(w), ))
    b_ = pm.Normal("b", mu=0.0, sigma=5.0)

    alpha = pm.HalfNormal("alpha", sigma=5.0)

    # Predictor: z = f(x, w)
    z = pt.dot(X_, w_) + b_

    # Likelihood: P(y|z)
    pm.NegativeBinomial("y_obs", mu=pt.exp(z), alpha=alpha, observed=y_)

What this does is reduce the number of graph rewrites (optimizations).

Dekermanjian · April 19, 2025, 6:02pm

Hey @lucidfrontier45,

Since the compiled model can’t be serialized you could get around that by creating multiple Ray actors and only compiling those one time and then in parallel you can swap out the data.

Running on a local Ray cluster would look like this:

import nutpie
import numpy as np
import pymc as pm
import pytensor.tensor as pt
import pytensor
import ray

print(pm.__version__)

# Start a local ray cluster
ray.init()

np.random.seed(0)

w = [1.0, -2.0, 3.0]
b = 4.0
N = 100

X = np.random.randn(N, len(w))
z = np.dot(X, w) + b
y = np.random.poisson(np.exp(z))

# Define your ray actor
@ray.remote
class NutpieModel:
    def __init__(self, X, y):
        with pm.Model() as self.model:
            # data
            X_ = pm.Data("X", X)
            y_ = pm.Data("y", y)

            # Prior: P(w|α)
            w_ = pm.Normal("w", mu=0.0, sigma=5.0, shape=(len(w), ))
            b_ = pm.Normal("b", mu=0.0, sigma=5.0)

            alpha = pm.HalfNormal("alpha", sigma=5.0)

            # Predictor: z = f(x, w)
            z = pt.dot(X_, w_) + b_

            # Likelihood: P(y|z)
            pm.NegativeBinomial("y_obs", mu=pt.exp(z), alpha=alpha, observed=y_)
        self.compiled_model = nutpie.compile_pymc_model(self.model)

    def sample(self, X = None, y = None):
        if X is None and y is None:
            idata = nutpie.sample(self.compiled_model)
        else:
            idata = nutpie.sample(self.compiled_model.with_data({"X": X, "y": y}))
        return idata

# Create 4 Ray actors
nutpie_models = [NutpieModel.remote(X=X, y=y) for _ in range(4)]

# Each actor which will be compiled once is called to run in parallel 10 times
idata_references = [nutpie_model.sample.remote() for nutpie_model in nutpie_models for _ in range(10)]

# Get back the inference data objects
idatas = ray.get(idata_references)

This is just an example and you will need to tailor this to your specific use case but I think this will help you run in parallel while minimizing compilation.

I really hope this helps!

lucidfrontier45 · April 20, 2025, 12:43am

@Dekermanjian

Thank you so much. I’ll try your suggested approach of compiling models on each process and reusing them.

Topic		Replies	Views
Model Compilation Avoidance When Reusing A Model v5	7	327	February 23, 2024
Compiling Functions Outside of Model Context v5 modeling	3	158	May 21, 2024
HPC + theano issues: Precompile logp and derivative function for sampling? Questions	2	656	July 28, 2020
Complicated, long polynomial expressions in Model - precompiling? v5	17	588	August 22, 2023
Can I reuse the sampler to speed up my code? sampling	5	519	May 11, 2024

Best practice for parallel model selection, especially avoidance of recompilation

Related topics