I’m working on accelerating the fitting of my hierarchical model by augmenting the number of CPU cores utilized. As I understand, PyMC3 utilizes half of the cores by default (which is also my experience on different computers at differing number of CPU cores).
In contrast to pm.sample()
, there is no cores
-parameter in pm.fit()
.
As the VI interface of PyMC3 uses scipy for the optimization, I tried to augment the number by setting some environment variables to the maximum number of cores:
os.environ[‘MKL_NUM_THREADS’] = ‘20’
os.environ[‘OPENBLAS_NUM_THREADS’] = ‘20’
os.environ[‘OMP_NUM_THREADS’] = ‘20’
os.environ[‘NUMEXPR_NUM_THREADS’] = ‘20’
Nonetheless, the computation still only utilizes half of the cores. However, by setting the environment variables to a number smaller than half of the cores, the number of actually utilized cores is limited, which works fine.
Is there a way to augment the number of cores utilized in the PyMC3 VI interface? This would be of great help.
Hi! PyMC3 does not use scipy all over the place. In VI, we use gradient descent implemented in Theano. And it’s theano who decide on number of cores used. Usually, it is the same as numpy does. ENV variables look reasonable but this is probably not the bottleneck of your model.
For performance gains I may suggest trying minibatches, this might help:) It will not increase number of cores used, but iterations will be faster (you might also want to change the learning rate)
Hey, thanks for your reply and sorry for my late response. The minibatches are a very helpful feature.
Anyways, I still made some research on the control of which and how many cores will be used. I figured out that it depends very much on the model, in some cases pymc3 even utilizes all 20 cores with the following setting:
os.environ[‘MKL_NUM_THREADS’] = ‘20’
os.environ[‘OMP_NUM_THREADS’] = ‘20’
os.environ[‘openmp’] = ‘True’
However, by using openMP there is even more possible than only setting the number of cores. For everybody who is interested in controling the CPU and core usage, some very helpful slides can be found here (especially slides 17-22): https://www.ixpug.org/documents/1506981937ixpugfall2017_21_up2.pdf
Hi,
I am able to control the number of cores with:
os.environ['MKL_NUM_THREADS'] = '4'
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['GOTO_NUM_THREADS'] = '4'
Nevertheless, I am experiencing some strange behaviors. I have a machine with 56 cores and my model takes 52h to run with all cores used. With 8 cores it takes ~22h, with 4 cores ~18h. I was expecting to see a reduction of the wall time with more cores used. This seems strange. Any recommendation?
1 Like