Running models in parallel


#1

I’m trying to speed up computation of a larger model and I’m wondering if I could run these in parallel?
For context, the model itself is a hierarchical model with several levels (Warehouse > Product > SKU) that already used broadcasting (via shape). So, to be able to process this for 52 weeks and 3 planning scenarios, I run this in a larger loop. Instead of a serial loop, are there any concerns running this in a parallel loop?
I’m not super familiar with theano, so I don’t know if this could cause hard-to-detect bugs if theano doesn’t keep the parallel loops neatly separated? I hope this makes sense.

Here’s kinda what I’m thinking:

import multiprocessing
from joblib import Parallel, parallel_backend, delayed

def run_model(data, epoch):

    model = get_model(data)
    trace = process_model(model, epoch)

    return trace

def main_parallel(epochs, n_jobs):

    num_cores = multiprocessing.cpu_count()

    backend = parallel_backend('multiprocessing')

    with backend:
        traces = Parallel(n_jobs=n_jobs, verbose=10)(
            delayed(run_model)(data, epoch)
            for epoch in range(0, epochs, 1)
        )

#2

Theano itself is not thread safe, so you shouldn’t use python threads to do the parallelisation. multiprocessing is fine though. I’d switch off the chain parallelisation in pymc (pass njobs=1 to pm.sample), it probably doesn’t make much sense in combination with what you are doing, and using multiprocessing within multiprocessing is I guess asking for trouble.

Theano uses marker files in ~/.theano to prevent multiple theano processes from compiling at the same time, but I don’t think that will hurt your performance much as long as the sampling isn’t just taking a very short time.


#3

Great feedback, I’ll give that a shot and report back. Thanks @aseyboldt !