Hi,
I’m trying running PyMC3 on GPUs. The initialization phase failed with below error when init = “auto”. The initialization phase can be run successfully when init = “advi”, however the sampling phase failed with the same error. I found a similar issue in https://github.com/pymc-devs/pymc3/issues/3087. But it seems the suggested solution (set jobs=1 in pm.sample) is not feasible because “jobs” is no longer a parameter in the latest PyMC3.
Thanks,
Yurong
More details:
- PyMC3 version: 3.9.3
- Instance: AWS EC2 p2.8xlarge instance with 8 GPUs, 32 vCPUs
- Theano config: [cuda] root=/usr/local/cuda [global] floatX = float32 device = cuda0 [lib] cnmem=1000
gof.link.raise_with_op(
File “/home/ubuntu/anaconda3/lib/python3.8/site-packages/theano/gof/link.py”, line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File “/home/ubuntu/anaconda3/lib/python3.8/site-packages/six.py”, line 702, in reraise
raise value.with_traceback(tb)
File “/home/ubuntu/anaconda3/lib/python3.8/site-packages/theano/compile/function_module.py”, line 903, in call
self.fn() if output_subset is None else
File “pygpu/gpuarray.pyx”, line 700, in pygpu.gpuarray.pygpu_empty
File “pygpu/gpuarray.pyx”, line 301, in pygpu.gpuarray.array_empty
pygpu.gpuarray.GpuArrayException: b’cuEventCreate: CUDA_ERROR_NOT_INITIALIZED: initialization error’
Apply node that caused the error: GpuFromHost(__args_joined)
Toposort index: 1
Inputs types: [TensorType(float32, vector)]
Inputs shapes: [(10029,)]
Inputs strides: [(4,)]
Inputs values: [‘not shown’]
Outputs clients: [[GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{0}, Constant{1}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{9983}, Constant{10006}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{1}, Constant{9983}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10006}, Constant{10007}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10007}, Constant{10008}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10008}, Constant{10009}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10009}, Constant{10010}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10010}, Constant{10011}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10011}, Constant{10012}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10012}, Constant{10013}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10013}, Constant{10014}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10014}, Constant{10015}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10015}, Constant{10016}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10016}, Constant{10017}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10017}, Constant{10018}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10018}, Constant{10019}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10019}, Constant{10020}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10020}, Constant{10021}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10021}, Constant{10022}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10022}, Constant{10023}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10023}, Constant{10024}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10024}, Constant{10025}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10025}, Constant{10026}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10026}, Constant{10027}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10027}, Constant{10028}), GpuSubtensor{int64:int64:}(GpuFromHost.0, Constant{10028}, Constant{10029})]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/sampling.py", line 415, in sample
start_, step = init_nuts(
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/sampling.py", line 1689, in init_nuts
step = pm.NUTS(potential=potential, model=model, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/step_methods/hmc/nuts.py", line 148, in __init__
super().__init__(vars, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 72, in __init__
super().__init__(vars, blocked=blocked, model=model, dtype=dtype, **theano_kwargs)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/step_methods/arraystep.py", line 227, in __init__
func = model.logp_dlogp_function(
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/model.py", line 819, in logp_dlogp_function
return ValueGradFunction(self.logpt, grad_vars, extra_vars, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/model.py", line 546, in __init__
self._vars_joined, self._cost_joined = self._build_joined(
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/model.py", line 627, in _build_joined
args_joined = tt.vector('__args_joined')
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "modeling_poc.py", line 146, in <module>
trace = pm.sample(SAMPLE, chains=CHAINS, target_accept=0.99, init="auto", tune=1000, n_init=1000, random_seed=999)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/sampling.py", line 469, in sample
trace = _mp_sample(**sample_args)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/sampling.py", line 1059, in _mp_sample
for draw in sampler:
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 394, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pymc3/parallel_sampling.py", line 297, in recv_draw
raise error from old_error
RuntimeError: Chain 0 failed.