Best practice for parallel model selection, especially avoidance of recompilation

I am trying compare multiple models which uses different feature extraction procedure or different prior parameters with WAIC. My approach is as follows.

  1. create a function which takes feature extraction and prior parameters, runs HMC and returns WAIC value (with arviz)
  2. generate a list of feature extraction and prior parameter sets.
  3. evaluate all sets of 2 with the function of 1 in parallel with joblib.

This approach works except for one problem: the long compilation time especially when I use nutpie+numba as the sampler.
I think recompilation is unavoidable if I use different feature extraction since the model structure itself is different. However I hope I can reuse compiled model when only prior parameter values are different.

Is there any recommended approach for this kind of issue?

Thank you.

Hey, @lucidfrontier45!

I am not sure if you can avoid recompilation if you are changing your priors. Hopefully someone else can chime in on that specific part.

I do know that if you have a model that isn’t changing, then you can avoid recompilation if you are using the same model on multiple datasets.

To do that, you have to use nutpie’s native API.

For example, if you have a model named model coded up how you typically code up a PyMC model using a context manager. You would then need to compile the model like this:

compiled_pymc_model = nutpie.compile_pymc_model(model) and then you can sample using:
idata = nutpie.sample(compiled_pymc_model)

Then when you want to update the data without recompiling the model you can do:

updated_pymc_model = compiled_pymc_model.with_data(**dictionary_object_with_your_new_data)

and then you can sample again like above using the newly updated_pymc_model object.

EDIT:
Also, just curious, how long of a compilation time are we talking about here? It really shouldn’t take more than a few seconds to compile. Do you have any loops inside of your model?

@Dekermanjian

Hi. Thak you for sharing your approach.
Yes, I have tried it and I confirmed it works if we just change hyperparameters of the prior.

However this compiled model is not picklable and I couldn’t use it with joblib.Parallel. Perhapse I can try it with threadpool in the latest non-GIL Python 3.13.

On compilation time, for example this simple NB regression model took about 40 seconds to be compiled with nutpie+numba in google colab environment.

Hey @lucidfrontier45, I am not sure what kind of hardware the colab is giving you but locally the model is compiling in 10 seconds on my laptop. I did confirm that it also takes me 40 seconds to compile on colab. If you run it locally, what is your compilation time?

@Dekermanjian

With Ryzen 5700X, the compilation time was about 10 seconds. But with Ryzen 5500U, it was about 30 seconds. In both cases the sampling times were about 1-2 seconds. So most of the time was spent on compiling model rather than actual sampling.

One more strange thing is that if you change model’s output dimension from (N, ) to (N, 1), i.e. adding one virtual dimension, the compilation time doubled.

Hey @lucidfrontier45,

There is a way to speed up compilation time, however, in most situations you want to avoid doing this because it trades compilation time with sampling time.

EDIT

That being said, at least on my machine, the total time is faster when I do the switch in compilation mode. Probably because the model is so simple.

Nevermind, I spoke too soon. The faster compilation time was probably because of caching.

You can DISREGARD the following.

You can try changing the pytensor compilation mode like this:

with pm.Model() as model:
  with pytensor.config.change_flags(mode='FAST_COMPILE'):
    # data
    X_ = pm.Data("X", X)
    y_ = pm.Data("y", y)

    # Prior: P(w|α)
    w_ = pm.Normal("w", mu=0.0, sigma=5.0, shape=(len(w), ))
    b_ = pm.Normal("b", mu=0.0, sigma=5.0)

    alpha = pm.HalfNormal("alpha", sigma=5.0)

    # Predictor: z = f(x, w)
    z = pt.dot(X_, w_) + b_

    # Likelihood: P(y|z)
    pm.NegativeBinomial("y_obs", mu=pt.exp(z), alpha=alpha, observed=y_)

What this does is reduce the number of graph rewrites (optimizations).

Hey @lucidfrontier45,

Since the compiled model can’t be serialized you could get around that by creating multiple Ray actors and only compiling those one time and then in parallel you can swap out the data.

Running on a local Ray cluster would look like this:

import nutpie
import numpy as np
import pymc as pm
import pytensor.tensor as pt
import pytensor
import ray

print(pm.__version__)

# Start a local ray cluster
ray.init()

np.random.seed(0)

w = [1.0, -2.0, 3.0]
b = 4.0
N = 100

X = np.random.randn(N, len(w))
z = np.dot(X, w) + b
y = np.random.poisson(np.exp(z))

# Define your ray actor
@ray.remote
class NutpieModel:
    def __init__(self, X, y):
        with pm.Model() as self.model:
            # data
            X_ = pm.Data("X", X)
            y_ = pm.Data("y", y)

            # Prior: P(w|α)
            w_ = pm.Normal("w", mu=0.0, sigma=5.0, shape=(len(w), ))
            b_ = pm.Normal("b", mu=0.0, sigma=5.0)

            alpha = pm.HalfNormal("alpha", sigma=5.0)

            # Predictor: z = f(x, w)
            z = pt.dot(X_, w_) + b_

            # Likelihood: P(y|z)
            pm.NegativeBinomial("y_obs", mu=pt.exp(z), alpha=alpha, observed=y_)
        self.compiled_model = nutpie.compile_pymc_model(self.model)

    def sample(self, X = None, y = None):
        if X is None and y is None:
            idata = nutpie.sample(self.compiled_model)
        else:
            idata = nutpie.sample(self.compiled_model.with_data({"X": X, "y": y}))
        return idata

# Create 4 Ray actors
nutpie_models = [NutpieModel.remote(X=X, y=y) for _ in range(4)]

# Each actor which will be compiled once is called to run in parallel 10 times
idata_references = [nutpie_model.sample.remote() for nutpie_model in nutpie_models for _ in range(10)]

# Get back the inference data objects
idatas = ray.get(idata_references)

This is just an example and you will need to tailor this to your specific use case but I think this will help you run in parallel while minimizing compilation.

I really hope this helps!

@Dekermanjian

Thank you so much. I’ll try your suggested approach of compiling models on each process and reusing them.

1 Like