Hi
I’m using PyMC with default option (i.e. 4 traces, no cores option) on a 4 core/8 thread machine.
Instead of using all cores, it puts 2 chains on one core.
ps uax shows
472689 xx 20 0 2285228 849228 19000 R 50.0 3.5 2:48.93 python
472686 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:48.73 python
472687 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:49.56 python
472688 xx 20 0 2285196 849228 19000 R 49.7 3.5 2:49.31 python
By distributing over all cores, it should alllow a 2x speedup.
I also don’t see the typical core-switching that one usually sees with running CPU intense jobs.
Finally, also testing on a workstation with many cores. Same result.
Can you show a complete example that we can run locally to compare?
This here uses 4 cores as it should on my machine with pymc 5.18.2:
import pymc as pm
import numpy as np
import pytensor.tensor as pt
with pm.Model() as model:
pm.Normal("x", shape=1000)
with model:
trace = pm.sample(draws=100_000, chains=4, cores=4, blas_cores="auto")
You can also try to change the blas_cores argument. If your logp function spends a lot of time in blas calls (matrix multiplications etc), that might change the behaviour, although I don’t understand why it would only use half the cores by default. This also uses 4 cores locally for me:
import pymc as pm
import numpy as np
import pytensor.tensor as pt
A = np.random.randn(1000, 1000)
with pm.Model() as model:
x = pm.Normal("x", shape=1000)
b = pm.Normal("y", mu=A @ x, shape=1000)
with model:
trace = pm.sample(draws=100_000, chains=4, cores=4, blas_cores="auto")
Thanks for helping out.
I tried the first script with also changing the blas_cores, same result as before: only 2 cores used, 50% cpu per python process.
This is with pyc 5.18.1
I start to suspect whatever library pymc is using, is misbehaving. Also because this odd core stickiness.
Usually jobs move around when the cpu is not fully loaded to improve thermals. Here this doesn’t happen.
I regularly use Python’s multiprocessing library, and that works fine.
I checked whether psutils returns the right number of cores, and it does.
That’s really strange. PyMC just uses multiprocessing, there is nothing particularly special happening.
Can you try to disable the progessbar? That also has some extra threads, and while I don’t see why it would cause something like that, better to disable it for debugging I think.
Can you also check if this is using the 4 threads correctly?
import multiprocessing as mp
def run_loop():
while True:
pass
processes = [mp.Process(target=run_loop) for _ in range(4)]
for process in processes:
process.start()
Other than that, what operating system are you using? How did you install the packages? Anything else that could be unusual about your setup?