There seem to be a number of queries posted here about changing the number of cores used, many of them unanswered or unresolved. And most are old. So I’m starting a new question, but correct me if this is wrong practice.
Is there documentation / examples of how to limit the number of cores used in sampling/estimation? Or how to choose a computation backend, if that’s a thing?
This is a simple four-chain estimate. It uses 128 threads!! :
Crazy! And I’m not sure how efficient this is? Aren’t all those kernel processes (ie the red color in the load bars) bad news?
I’m not specifying the cores=
parameter in pm.sample, which others say does not help this problem.
I’m interested in what is efficient, and also in limiting the number of cores used so that I can estimate more than one model at once, and also so that my server can do other things!!
My computation processes are nice
d.
Thanks!
c
You can try setting the environment variable OMP_NUM_THREADS=1
.
That did not make any difference!
!echo $OMP_NUM_THREADS
40
But the estimate still used 128 threads.
You can try MKL_NUM_THREADS
instead
Same thing using MKL_NUM_THREADS
as well: all 128 threads fully used.
Maybe worth trying to reproduce on a more conventional machine and see if the problem also crops up there?
And to be sure did you tey setting them to 1, not 40? Are those real cpu cores or virtual ones?
Okay, I tried on my laptop with the max set to 2 and then with it set to 1. In both cases, all 16 threads of my laptop are used at 100%.
Just to be clear, this does not happen when my data size is small (1000) but does when it is larger (10000). With smaller samples, four threads are used (one for each chain).
How and when do you set the env variables?
From within python, before sampling the model.
if max_processor_threads is not None:
os.environ["OMP_NUM_THREADS"] = str(max_processor_threads)
print(f"Set OMP_NUM_THREADS to {max_processor_threads}")
os.environ["MKL_NUM_THREADS"] = str(max_processor_threads)
print(f"Set MKL_NUM_THREADS to {max_processor_threads}")
trace_filename = f"{self.basename}.nc"
print(f"Building model {modelclass} for {self.basename}")
model = self.build_model(df, modelclass, **kwargs)
with model:
trace = pm.sample() #return_inferencedata=True)
```
Try to do it before any other imports
2 Likes
Thanks.
Problem still exists in PyMC 5.10.4 etc.
The trick for me was that it is numpy
which must not have been imported prior to setting
os.environ['OPENBLAS_NUM_THREADS'] = '1'
But I was working in an ipython
environment shortcut which preloaded numpy
automatically, before starting up.
So my fix is that I now have a module with the following content, which I import first. It warns me if numpy
has already been loaded before I set the OPENBLAS environment setting:
import os
# On my computation server and laptop, stopping openblas multithreading speeds things up by stopping the wild multithreading version of numpy! Do not preload numpy in ipython (or any earlier-loaded modules).
os.environ['OPENBLAS_NUM_THREADS'] = '1'
# Following seem not to matter:
#os.environ['MKL_NUM_THREADS'] = '1'
#os.environ['NUMEXPR_NUM_THREADS'] = '1'
#os.environ['OMP_NUM_THREADS'] = '1'
try:
np
print('\n\nIt looks like you have loaded numpy before we've disabled OPENBLAS crazy-threading. Do not preload numpy in ipython.\n\n')
raise ImportError("Start ipython without numpy preloaded")
except NameError as e:
print(' Successfully checked that numpy was not preloaded before setting OpenBLAS variable.')
import numpy as np
Now htop looks a lot nicer! this is with 15 estimates now going in parallel, and each taking 4 threads for 4 chains, with no OPENBLAS splitting:
Unlike before, those processes are blue, not red.
1 Like