Cores not optimally used

cosine · November 22, 2024, 10:58am

Hi
I’m using PyMC with default option (i.e. 4 traces, no cores option) on a 4 core/8 thread machine.
Instead of using all cores, it puts 2 chains on one core.

ps uax shows
472689 xx 20 0 2285228 849228 19000 R 50.0 3.5 2:48.93 python
472686 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:48.73 python
472687 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:49.56 python
472688 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:49.31 python

By distributing over all cores, it should alllow a 2x speedup.

I also don’t see the typical core-switching that one usually sees with running CPU intense jobs.

Finally, also testing on a workstation with many cores. Same result.

ricardoV94 · November 22, 2024, 11:40am

I think by default pymc uses as many cores as chains? Did you try setting cores argument to pm.sample explictly?

cosine · November 22, 2024, 4:16pm

Hi
Setting cores in pm.sample(cores=4) did not change anything.

I think it is deeper down the rabbithole on how pymc deals with parallel jobs.

ricardoV94 · November 25, 2024, 10:17am

Setting cores=4, results in PyMC trying to still sample 2 chains x 2 cores only? Can you show the logging message it prints?

cosine · November 25, 2024, 11:03am

It runs 4 chains alright, but they are distributed over 2 cores only, not 4.

What log do you like?

ricardoV94 · November 25, 2024, 5:25pm

This line: pymc/pymc/sampling/mcmc.py at 5352798ee0d36ed566e651466e54634b1b9a06c8 · pymc-devs/pymc · GitHub

cosine · November 25, 2024, 5:40pm

It reports:
Multiprocess sampling (4 chains in 4 jobs)

cosine · November 25, 2024, 5:41pm

What does PyMc use for multithreading?

cosine · November 25, 2024, 5:48pm

Tried OMP_NUM_THREADS=4 and OMP_NUM_THREADS=1, but saw no difference.
Ipython vs python made no difference either.

aseyboldt · November 25, 2024, 8:01pm

Can you show a complete example that we can run locally to compare?

This here uses 4 cores as it should on my machine with pymc 5.18.2:

import pymc as pm
import numpy as np
import pytensor.tensor as pt

with pm.Model() as model:
    pm.Normal("x", shape=1000)

with model:
    trace = pm.sample(draws=100_000, chains=4, cores=4, blas_cores="auto")

You can also try to change the blas_cores argument. If your logp function spends a lot of time in blas calls (matrix multiplications etc), that might change the behaviour, although I don’t understand why it would only use half the cores by default. This also uses 4 cores locally for me:

import pymc as pm
import numpy as np
import pytensor.tensor as pt

A = np.random.randn(1000, 1000)

with pm.Model() as model:
    x = pm.Normal("x", shape=1000)
    b = pm.Normal("y", mu=A @ x, shape=1000)

with model:
    trace = pm.sample(draws=100_000, chains=4, cores=4, blas_cores="auto")

cosine · November 25, 2024, 8:59pm

Thanks for helping out.
I tried the first script with also changing the blas_cores, same result as before: only 2 cores used, 50% cpu per python process.
This is with pyc 5.18.1

I start to suspect whatever library pymc is using, is misbehaving. Also because this odd core stickiness.
Usually jobs move around when the cpu is not fully loaded to improve thermals. Here this doesn’t happen.

I regularly use Python’s multiprocessing library, and that works fine.
I checked whether psutils returns the right number of cores, and it does.

aseyboldt · November 25, 2024, 9:25pm

That’s really strange. PyMC just uses multiprocessing, there is nothing particularly special happening.

Can you try to disable the progessbar? That also has some extra threads, and while I don’t see why it would cause something like that, better to disable it for debugging I think.

Can you also check if this is using the 4 threads correctly?

import multiprocessing as mp

def run_loop():
    while True:
        pass

processes = [mp.Process(target=run_loop) for _ in range(4)]

for process in processes:
    process.start()

Other than that, what operating system are you using? How did you install the packages? Anything else that could be unusual about your setup?

cosine · November 25, 2024, 10:43pm

The multiprocessing script uses all cores.
Turning the progress bar off in the previous script made no difference.

The multiprocessing function call has a cores option too.
E.g. pool = Pool(maxcpu=20)

Is this inherited from pymc?

For the rest, this is Ubuntu with anaconda, PyMc is installed with pip.

cluhmann · November 25, 2024, 10:56pm

What happens if you install via the official installation instructions?

cosine · November 26, 2024, 10:56am

Cluhmann,
That actually fixed it.
Thank you!

(very frustrating python’s package management!)

ricardoV94 · November 26, 2024, 11:37am

I have to say your case is very puzzling

cosine · November 26, 2024, 4:19pm

Thanks again for the help!
I guess there is some subtle change between versions how the multiprocessing via pipes is done…

Topic		Replies	Views
Is there a logging error in pm.sample() multiprocessing? Questions	0	565	March 29, 2018
Number of cores settings in sampling Questions	3	674	October 30, 2021
Pm.sample gets stuck after init with cores > 1 Questions	17	3954	January 4, 2021
Regarding the use of multiple cores Questions	4	7519	July 18, 2023
Can not use more than 1 core Questions	1	1572	March 16, 2021

Cores not optimally used

Related topics