Regarding the use of multiple cores

aseyboldt · December 23, 2019, 7:02pm

Maybe it helps to explain a bit how the parallelization works:

When you specify cores>1 in pm.sample, pymc will start one new process for each chain. The main process then tells cores of those processes to start sampling, the others will just wait and do nothing. When one of the processes is finished, one of the waiting processes is told to start sampling. There will never be more than cores processes working at the same time. If you have n cores in your computer, it makes sense for most models to set the cores argument to that number so that all of them are working.
Things get a bit more complicated if the models is using very large arrays somewhere or if it involves a lot of BLAS operations (those are matrix-vector and matrix-matrix multiplications and some other dense linear algebra related things, eg in a model involving large GPs). Each of the processes might then start additional workers on its own: With large arrays, theano will start a thread pool using openmp, the size can be configured in the .theanorc and the thread pool size with the OMP_NUM_THREADS environment variable. Depending on the blas implementation, the number of threads those use is controlled with MKL_NUM_THREADS or OPENBLAS_NUM_THREADS.

Unfortunately, those three sources of parallelism do not know anything about each other. So it can easily happen that you start 8 processes with pm.sample(cores=g), and each of those starts 8 processes using blas. This gives you 64 processes in total, which will really slow things down. The operating system will do it’s best to distribute the processes to the available cores/hardware threads, but if there are not enough available, things will slow down because the processes fight over resources like the cache and because you pay the costs of parallelization without any benefit. In cases like that you need to either decrease cores or the number of blas/openmp threads.

Topic		Replies	Views
Cores not optimally used version agnostic bug	16	108	November 26, 2024
What is the effect of running sample() with 4 chains on 16 cores? Questions	2	474	May 7, 2019
Number of cores settings in sampling Questions	3	675	October 30, 2021
Does anyone know of any guides or docs for parallel sampling in current version of PYMC (mp_ctx) v5	3	1153	July 24, 2023
Sample with multiple cores Questions	3	1477	September 10, 2020

Regarding the use of multiple cores

Related topics