Limiting the number of cores/threads used in PyMC5.6+

ceebeelee · July 18, 2023, 3:49pm

There seem to be a number of queries posted here about changing the number of cores used, many of them unanswered or unresolved. And most are old. So I’m starting a new question, but correct me if this is wrong practice.

Is there documentation / examples of how to limit the number of cores used in sampling/estimation? Or how to choose a computation backend, if that’s a thing?

This is a simple four-chain estimate. It uses 128 threads!! :

Crazy! And I’m not sure how efficient this is? Aren’t all those kernel processes (ie the red color in the load bars) bad news?

I’m not specifying the cores= parameter in pm.sample, which others say does not help this problem.

I’m interested in what is efficient, and also in limiting the number of cores used so that I can estimate more than one model at once, and also so that my server can do other things!!
My computation processes are niced.

Thanks!
c

twiecki · July 18, 2023, 8:14pm

You can try setting the environment variable OMP_NUM_THREADS=1.

ceebeelee · July 23, 2023, 7:40pm

That did not make any difference!

!echo $OMP_NUM_THREADS
40

But the estimate still used 128 threads.

ricardoV94 · July 23, 2023, 9:02pm

You can try MKL_NUM_THREADS instead

ceebeelee · July 24, 2023, 1:34am

Same thing using MKL_NUM_THREADS as well: all 128 threads fully used.

ricardoV94 · July 24, 2023, 5:43am

Maybe worth trying to reproduce on a more conventional machine and see if the problem also crops up there?

ricardoV94 · July 24, 2023, 5:44am

And to be sure did you tey setting them to 1, not 40? Are those real cpu cores or virtual ones?

ceebeelee · July 26, 2023, 2:20pm

Okay, I tried on my laptop with the max set to 2 and then with it set to 1. In both cases, all 16 threads of my laptop are used at 100%.

Just to be clear, this does not happen when my data size is small (1000) but does when it is larger (10000). With smaller samples, four threads are used (one for each chain).

twiecki · July 26, 2023, 9:48pm

How and when do you set the env variables?

ceebeelee · July 26, 2023, 10:09pm

From within python, before sampling the model.

     
        if max_processor_threads is not None:
            os.environ["OMP_NUM_THREADS"] = str(max_processor_threads)
            print(f"Set OMP_NUM_THREADS to {max_processor_threads}")
            os.environ["MKL_NUM_THREADS"] = str(max_processor_threads)
            print(f"Set MKL_NUM_THREADS to {max_processor_threads}")
        trace_filename = f"{self.basename}.nc"
        
        print(f"Building model {modelclass} for {self.basename}")
        model = self.build_model(df, modelclass, **kwargs)
        with model:
            trace = pm.sample() #return_inferencedata=True)
        
    ```

ricardoV94 · July 27, 2023, 4:57am

Try to do it before any other imports

ceebeelee · August 19, 2024, 12:32am

Thanks.
Problem still exists in PyMC 5.10.4 etc.
The trick for me was that it is numpy which must not have been imported prior to setting

os.environ['OPENBLAS_NUM_THREADS'] = '1'

But I was working in an ipython environment shortcut which preloaded numpy automatically, before starting up.
So my fix is that I now have a module with the following content, which I import first. It warns me if numpy has already been loaded before I set the OPENBLAS environment setting:

import os
# On my computation server and laptop, stopping openblas multithreading speeds things up by stopping the wild multithreading version of numpy! Do not preload numpy in ipython (or any earlier-loaded modules).
os.environ['OPENBLAS_NUM_THREADS'] = '1' 
# Following seem not to matter:
#os.environ['MKL_NUM_THREADS'] = '1'
#os.environ['NUMEXPR_NUM_THREADS'] = '1'
#os.environ['OMP_NUM_THREADS'] = '1'
try:
    np
    print('\n\nIt looks like you have loaded numpy before we've disabled OPENBLAS crazy-threading. Do not preload numpy in ipython.\n\n')
    raise ImportError("Start ipython without numpy preloaded")
except  NameError as e:
    print(' Successfully checked that numpy was not preloaded before setting OpenBLAS variable.')
    import numpy as np

Now htop looks a lot nicer! this is with 15 estimates now going in parallel, and each taking 4 threads for 4 chains, with no OPENBLAS splitting:

Unlike before, those processes are blue, not red.

W-L · December 6, 2024, 11:15am

Thank you! I run into the same problem when I try to estimate the resource requirements on the login node of our hpc before submitting jobs. It would use lots of threads, which leads to vastly increased running time of pm.fit() in my case. Setting import os; environ['OPENBLAS_NUM_THREADS'] = '4' at the beginning of my scripts seems to work well to limit this.
Interestingly, limiting the threads on my laptop also makes pm.fit() run a bit faster than using all cores (which it does by default). I don’t understand enough about what’s going on behind the scenes, but it seems that the sweet-spot is around 4 threads?

cosine · December 6, 2024, 4:22pm

Regarding the optimal number of threads. I have seen in other applications (non-PyMC) that using less threads can boost performance. This happens in particular when the tasks are memory intensive; the data transport overhead then cloggs the processor.

It can also help to try various linear algebra libraries.
But np.show_config() is known to not always be accurate:

ricardoV94 · December 6, 2024, 5:30pm

the common problem is the system trying to use more threads than those available and then have to wait around until they get released

Topic		Replies	Views
Cores not optimally used version agnostic bug	16	105	November 26, 2024
Number of cores settings in sampling Questions	3	671	October 30, 2021
Number of cores in variational inference interface Questions	3	786	October 4, 2021
Regarding the use of multiple cores Questions	4	7486	July 18, 2023
Sample with multiple cores Questions	3	1470	September 10, 2020

Limiting the number of cores/threads used in PyMC5.6+

Related topics