Limiting the number of cores/threads used in PyMC5.6+

There seem to be a number of queries posted here about changing the number of cores used, many of them unanswered or unresolved. And most are old. So I’m starting a new question, but correct me if this is wrong practice.

Is there documentation / examples of how to limit the number of cores used in sampling/estimation? Or how to choose a computation backend, if that’s a thing?

This is a simple four-chain estimate. It uses 128 threads!! :

Crazy! And I’m not sure how efficient this is? Aren’t all those kernel processes (ie the red color in the load bars) bad news?

I’m not specifying the cores= parameter in pm.sample, which others say does not help this problem.

I’m interested in what is efficient, and also in limiting the number of cores used so that I can estimate more than one model at once, and also so that my server can do other things!!
My computation processes are niced.

Thanks!
c

You can try setting the environment variable OMP_NUM_THREADS=1.

That did not make any difference!

!echo $OMP_NUM_THREADS
40

But the estimate still used 128 threads.

You can try MKL_NUM_THREADS instead

Same thing using MKL_NUM_THREADS as well: all 128 threads fully used.

Maybe worth trying to reproduce on a more conventional machine and see if the problem also crops up there?

And to be sure did you tey setting them to 1, not 40? Are those real cpu cores or virtual ones?

Okay, I tried on my laptop with the max set to 2 and then with it set to 1. In both cases, all 16 threads of my laptop are used at 100%.

Just to be clear, this does not happen when my data size is small (1000) but does when it is larger (10000). With smaller samples, four threads are used (one for each chain).

How and when do you set the env variables?

From within python, before sampling the model.

     
        if max_processor_threads is not None:
            os.environ["OMP_NUM_THREADS"] = str(max_processor_threads)
            print(f"Set OMP_NUM_THREADS to {max_processor_threads}")
            os.environ["MKL_NUM_THREADS"] = str(max_processor_threads)
            print(f"Set MKL_NUM_THREADS to {max_processor_threads}")
        trace_filename = f"{self.basename}.nc"
        
        print(f"Building model {modelclass} for {self.basename}")
        model = self.build_model(df, modelclass, **kwargs)
        with model:
            trace = pm.sample() #return_inferencedata=True)
        
    ```

Try to do it before any other imports

2 Likes

Thanks.
Problem still exists in PyMC 5.10.4 etc.
The trick for me was that it is numpy which must not have been imported prior to setting

os.environ['OPENBLAS_NUM_THREADS'] = '1' 

But I was working in an ipython environment shortcut which preloaded numpy automatically, before starting up.
So my fix is that I now have a module with the following content, which I import first. It warns me if numpy has already been loaded before I set the OPENBLAS environment setting:

import os
# On my computation server and laptop, stopping openblas multithreading speeds things up by stopping the wild multithreading version of numpy! Do not preload numpy in ipython (or any earlier-loaded modules).
os.environ['OPENBLAS_NUM_THREADS'] = '1' 
# Following seem not to matter:
#os.environ['MKL_NUM_THREADS'] = '1'
#os.environ['NUMEXPR_NUM_THREADS'] = '1'
#os.environ['OMP_NUM_THREADS'] = '1'
try:
    np
    print('\n\nIt looks like you have loaded numpy before we've disabled OPENBLAS crazy-threading. Do not preload numpy in ipython.\n\n')
    raise ImportError("Start ipython without numpy preloaded")
except  NameError as e:
    print(' Successfully checked that numpy was not preloaded before setting OpenBLAS variable.')
    import numpy as np

Now htop looks a lot nicer! this is with 15 estimates now going in parallel, and each taking 4 threads for 4 chains, with no OPENBLAS splitting:


Unlike before, those processes are blue, not red. :slight_smile:

1 Like