NUTS uses all cores

NateAM · March 7, 2018, 5:17pm

I am running pymc3 on a machine with a large number of cores (>32) and when sampling using NUTS all of the cores are being utilized. I have set the following environment variables:

MKL_NUM_THREADS=8
OMP_NUM_THREADS=8

but I still observe the behavior in NUTS though Metropolis is well behaved. Are there any other variables I should set to limit the core usage?

junpenglao · March 7, 2018, 5:39pm

Some of the theano_ops will use all the cores, for example in the GP module. I think @bwengals knows a bit more on this.

NateAM · March 7, 2018, 6:38pm

It is strange because I was alright using NUTS on some of the problems I was working on yesterday but now something seems to have changed.

If there is a way to limit NUTS to only use a specific number of cores that would be very helpful.

NateAM · March 7, 2018, 7:13pm

The reason it worked before was user error. I was editing a jupyter notebook and some variables were stored that I didn’t recognize.

junpenglao · March 7, 2018, 7:52pm

Do you use MvNormal distribution in your model?

NateAM · March 7, 2018, 10:02pm

Yes I am.

junpenglao · March 8, 2018, 1:07pm

Yeah from Multidimensional gaussian process - #4 by bwengals

The matrix operations used by Theano here are multithreaded, so running multiple chains simultaneously bogs things down.

I am not sure how to limit it to single thread per chain tho.

NateAM · March 8, 2018, 2:32pm

Okay, it isn’t an issue at the moment but it would be helpful if that was something we could set in the future.

Jan · December 19, 2018, 10:54pm

Hey, I found the same issue for HMC and NUTS. It is really troublesome since i am working on a shared machine where i am not allowed to occupy all CPUs. Is there a way to limit the cores that a sampler can use? Metropolis works well by setting cores=1.

aseyboldt · December 20, 2018, 8:56am

There are three reasons why NUTS and HMC would use several cores:

Some theano ops use BLAS, which will usually be multithreaded. There are several implementations of BLAS, and which one we use depends on which one numpy uses (you can check with np.__config__.show()). If you are using MKL, you can control the number of threads by setting the environment variable MKL_NUM_THREADS. The same variable should also work for openblas. If you are using atlas, then you are out of luck, as that one must be configured at compile time.
Some theano ops use openmp explicitly. You could switch that off entirely by setting a config option in ~/.theanorc: http://deeplearning.net/software/theano/library/config.html#config.openmp. And you can control the number of threads using OMP_NUM_THREADS.
By default we use multiprocessing to parallelize several chains. You can control the number of cores we use there by setting the cores kwargs in pm.sample(cores=4). So the total number of cores you might be using at max is cores * max(MKL_NUM_THREADS, OMP_NUM_THREADS).

ferrine · December 20, 2018, 3:03pm

So cores is not a correct argument name…

Jan · December 21, 2018, 10:27pm

FYI, by setting MKL_NUM_THREADS =1 and turning of the Theano config.openmp, the sampler seems to work actually faster. Maybe this has sth to do with the overhead induced by inter-processes communication.

banyan-y · May 20, 2020, 3:30am

I’m a beginner, and when I run a program on a book on jupyter notebook, I get a problem with having to label the cores=1 to sample, otherwise something will go wrong: RuntimeError: The pipe pipe process and its spawned children is broken.
What should I do, please? Thank you for your busy guidance
such as:
with model:
step1 = pm.Metropolis(vars=[p, sds, centers])
step2 = pm.ElemwiseCategorical(vars=[assignment])
trace = pm.sample(15000, step=[step1, step2])
run error is:
BrokenPipeError Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in init(self, draws, tune, step_method, chain, seed, start)
241 try:
–> 242 self._process.start()
243 except IOError as e:

D:\ProgramData\Anaconda3\lib\multiprocessing\process.py in start(self)
111 _cleanup()
–> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel

D:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
222 def _Popen(process_obj):
–> 223 return _default_context.get_context().Process._Popen(process_obj)
224

D:\ProgramData\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
321 from .popen_spawn_win32 import Popen
–> 322 return Popen(process_obj)
323

D:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
88 reduction.dump(prep_data, to_child)
—> 89 reduction.dump(process_obj, to_child)
90 finally:

D:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’’’
—> 60 ForkingPickler(file, protocol).dump(obj)
61

BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
2 step1 = pm.Metropolis(vars=[p, sds, centers])
3 step2 = pm.ElemwiseCategorical(vars=[assignment])
----> 4 trace = pm.sample(15000, step=[step1, step2])

D:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
467 _print_step_hierarchy(step)
468 try:
–> 469 trace = _mp_sample(**sample_args)
470 except pickle.PickleError:
471 _log.warning(“Could not pickle model, sampling singlethreaded.”)

D:\ProgramData\Anaconda3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
1052
1053 sampler = ps.ParallelSampler(
-> 1054 draws, tune, chains, cores, random_seed, start, step, chain, progressbar
1055 )
1056 try:

D:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in init(self, draws, tune, chains, cores, seeds, start_points, step_method, start_chain_num, progressbar)
357 draws, tune, step_method, chain + start_chain_num, seed, start
358 )
–> 359 for chain, seed, start in zip(range(chains), seeds, start_points)
360 ]
361

D:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in (.0)
357 draws, tune, step_method, chain + start_chain_num, seed, start
358 )
–> 359 for chain, seed, start in zip(range(chains), seeds, start_points)
360 ]
361

D:\ProgramData\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in init(self, draws, tune, step_method, chain, seed, start)
249 # all its error message
250 time.sleep(0.2)
–> 251 raise exc
252 raise
253

RuntimeError: The communication pipe between the main process and its spawned children is broken.
In Windows OS, this usually means that the child process raised an exception while it was being spawned, before it was setup to communicate to the main process.
The exceptions raised by the child process while spawning cannot be caught or handled from the main process, and when running from an IPython or jupyter notebook interactive kernel, the child’s exception and traceback appears to be lost.
A known way to see the child’s error, and try to fix or handle it, is to run the problematic code as a batch script from a system’s Command Prompt. The child’s exception will be printed to the Command Promt’s stderr, and it should be visible above this error and traceback.
Note that if running a jupyter notebook that was invoked from a Command Prompt, the child’s exception should have been printed to the Command Prompt on which the notebook is running.
and The following code can run until the end of the sampling:
with model:
step1 = pm.Metropolis(vars=[p, sds, centers])
step2 = pm.ElemwiseCategorical(vars=[assignment])
trace = pm.sample(15000, step=[step1, step2],cores=1)

[I 10:47:19.420 NotebookApp] Saving file at /Probabilistic-Programming-and-Bayes
ian-Methods-for-Hackers-master/Chapter3_MCMC/Ch3_IntroMCMC_PyMC3.ipynb
[I 10:49:19.443 NotebookApp] Saving file at /Probabilistic-Programming-and-Bayes
ian-Methods-for-Hackers-master/Chapter3_MCMC/Ch3_IntroMCMC_PyMC3.ipynb
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File “D:\ProgramData\Anaconda3\lib\site-packages\theano\gpuarray_init_.py”,
line 227, in
use(config.device)
File “D:\ProgramData\Anaconda3\lib\site-packages\theano\gpuarray_init_.py”,
line 214, in use
init_dev(device, preallocate=preallocate)
File “D:\ProgramData\Anaconda3\lib\site-packages\theano\gpuarray_init_.py”,
line 65, in init_dev
raise RuntimeError(“You can’t initialize the GPU in a subprocess if the pare
nt process already did it”)
RuntimeError: You can’t initialize the GPU in a subprocess if the parent process
already did it
Traceback (most recent call last):
File “”, line 1, in
File “D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py”, line 105, in spa
wn_main
exitcode = _main(fd)
File “D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py”, line 115, in _ma
in
self = reduction.pickle.load(from_parent)
File “D:\ProgramData\Anaconda3\lib\site-packages\theano\gpuarray\type.py”, lin
e 899, in GpuArray_unpickler
ctx = get_context(ctx_name)
File “D:\ProgramData\Anaconda3\lib\site-packages\theano\gpuarray\type.py”, lin
e 104, in get_context
raise ContextNotDefined(“context name %s not defined” % (name,))
theano.gpuarray.type.ContextNotDefined: context name None not defined
[I 11:01:19.451 NotebookApp] Saving file at /Probabilistic-Programming-and-Bayes
ian-Methods-for-Hackers-master/Chapter3_MCMC/Ch3_IntroMCMC_PyMC3.ipynb

win7+anaconda_python3.7+theano1.0.4+pymc3 3.8

thank you !

junpenglao · May 20, 2020, 5:44am

This is a known windows issue NUT sampler stuck under windows with njobs>1 and unfortunately we dont yet have a good solution for.

ProcessEngineer · May 20, 2020, 6:07pm

banyan-y I get the same error unless I put:

if __name__ == "__main__":

around my sampling script. Does wrapping your script in if __name__ == "__main__": resolve the issue?

banyan-y · May 21, 2020, 12:01am

thank you! I had another problem, and when I followed the link below, I encountered a drawing failure, and I found that my results were different, except that the gpuarry variable had a type, and everything else was the same. How should I modify it?
https://docs.pymc.io/notebooks/variational_api_quickstart.html
The error code is:
fig = plt.figure(figsize=(16, 9))
mu_ax = fig.add_subplot(221)
mu_ax.plot(tracker[‘mean’])
mu_ax.set_title(‘Mean track’)
std_ax = fig.add_subplot(222)
std_ax.plot(tracker[‘std’])
std_ax.set_title(‘Std track’)
hist_ax = fig.add_subplot(212)
hist_ax.plot(advi.hist)
hist_ax.set_title(‘Negative ELBO track’);

and these are errors:

AttributeError Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\cbook_init_.py in index_of(y)
1673 try:
-> 1674 return y.index.values, y.values
1675 except AttributeError:

AttributeError: ‘builtin_function_or_method’ object has no attribute ‘values’

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not ‘pygpu.gpuarray.GpuArray’

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
in
3 std_ax = fig.add_subplot(222)
4 hist_ax = fig.add_subplot(212)
----> 5 mu_ax.plot(tracker[‘mean’])
6
7 mu_ax.set_title(‘Mean track’)

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_axes.py in plot(self, scalex, scaley, data, *args, **kwargs)
1663 “”"
1664 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D._alias_map)
-> 1665 lines = [*self._get_lines(*args, data=data, **kwargs)]
1666 for line in lines:
1667 self.add_line(line)

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_base.py in call(self, *args, **kwargs)
223 this += args[0],
224 args = args[1:]
–> 225 yield from self._plot_args(this, kwargs)
226
227 def get_next_color(self):

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_base.py in _plot_args(self, tup, kwargs)
387 y = _check_1d(tup[-1])
388 else:
–> 389 x, y = index_of(tup[-1])
390
391 x, y = self._xy_from_xy(x, y)

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\cbook_init_.py in index_of(y)
1674 return y.index.values, y.values
1675 except AttributeError:
-> 1676 y = _check_1d(y)
1677 return np.arange(y.shape[0], dtype=float), y
1678

D:\ProgramData\Anaconda3\lib\site-packages\matplotlib\cbook_init_.py in _check_1d(x)
1397 ‘’’
1398 if not hasattr(x, ‘shape’) or len(x.shape) < 1:
-> 1399 return np.atleast_1d(x)
1400 else:
1401 try:

<array_function internals> in atleast_1d(*args, **kwargs)

D:\ProgramData\Anaconda3\lib\site-packages\numpy\core\shape_base.py in atleast_1d(*arys)
65 res = []
66 for ary in arys:
—> 67 ary = asanyarray(ary)
68 if ary.ndim == 0:
69 result = ary.reshape(1)

D:\ProgramData\Anaconda3\lib\site-packages\numpy\core_asarray.py in asanyarray(a, dtype, order)
136
137 “”"
–> 138 return array(a, dtype, copy=False, order=order, subok=True)
139
140

ValueError: setting an array element with a sequence.

my tracker[‘mean’] is like this:
[gpuarray.array([0.33900008], dtype=float32),
gpuarray.array([0.33892253], dtype=float32),
gpuarray.array([0.3396845], dtype=float32),
gpuarray.array([0.34037325], dtype=float32),
gpuarray.array([0.3403642], dtype=float32),
gpuarray.array([0.33999673], dtype=float32),
gpuarray.array([0.34050706], dtype=float32),
gpuarray.array([0.3402906], dtype=float32),
gpuarray.array([0.34043068], dtype=float32),
gpuarray.array([0.34083998], dtype=float32),
gpuarray.array([0.34056437], dtype=float32),
gpuarray.array([0.3400177], dtype=float32),
gpuarray.array([0.33988926], dtype=float32),
gpuarray.array([0.34016877], dtype=float32),
gpuarray.array([0.33968213], dtype=float32),
gpuarray.array([0.33882535], dtype=float32),
gpuarray.array([0.33869746], dtype=float32),
gpuarray.array([0.33891052], dtype=float32),
…
Thank you for your busy time for your guidance.Thank you very much!

banyan-y · May 21, 2020, 2:31am

thank you and how to do it?

ProcessEngineer · May 21, 2020, 2:52am

I literally just indent my entire script and put if __name__ == "__main__": before it:

Example of a script that gives me “RuntimeError: The communication pipe between the main process and its spawned children is broken.” (example code taken from the Getting Started page)

import numpy as np
import pymc3 as pm

alpha, sigma = 1, 1
beta = [1, 2.5]
size = 100

X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma

basic_model = pm.Model()

with basic_model:

    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sigma=10)
    beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal('sigma', sigma=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)


with basic_model:
    # draw 500 posterior samples
    trace = pm.sample(500)

If I wrap the entire script in the statement if __name__ == "__main__": then it runs:

if __name__ == "__main__":
    import numpy as np
    import pymc3 as pm
    
    alpha, sigma = 1, 1
    beta = [1, 2.5]
    size = 100
    
    X1 = np.random.randn(size)
    X2 = np.random.randn(size) * 0.2
    
    Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma
    
    basic_model = pm.Model()
    
    with basic_model:
    
        # Priors for unknown model parameters
        alpha = pm.Normal('alpha', mu=0, sigma=10)
        beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
        sigma = pm.HalfNormal('sigma', sigma=1)
    
        # Expected value of outcome
        mu = alpha + beta[0]*X1 + beta[1]*X2
    
        # Likelihood (sampling distribution) of observations
        Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
    
    
    with basic_model:
        # draw 500 posterior samples
        trace = pm.sample(500)

I think you only need to wrap the actual lines of code where you run pm.sample with this, but I’ve just made a habit of wrapping it around the whole script for convenience.

banyan-y · May 21, 2020, 3:16am

Thank! That’s a good idea, I’ll try it.

Topic		Replies	Views
Regarding the use of multiple cores Questions	4	7489	July 18, 2023
NUTS sampler uses all cores as kernel threads Questions	1	334	June 11, 2021
New machine does not use more than 1 core for linear algebra, unresponsive to changing env variables Questions	4	648	February 12, 2021
Number of cores settings in sampling Questions	3	672	October 30, 2021
Limiting the number of cores/threads used in PyMC5.6+	14	831	December 6, 2024

NUTS uses all cores

Related topics