NUTS sampling doesn't work

Hi Leon,

I ran your example on:

Windows 10
PyMC3 v3.9.3

Calling pm.sample() with the default options works like a charm with sampling being very fast and using all my cores (4 chains on 4 cores).

However, when I call pm.sample(init='advi) like in your code first I get the following complain (which I believe is not relevant to the current situation but for completeness…)

Auto-assigning NUTS sampler...
Initializing NUTS using advi...
C:\Users\xxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gpuarray\dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7.
  warnings.warn("Your cuDNN version is more recent than "

Then advi interrupts at 6 % with:

Convergence achieved at 13200
Interrupted at 13,199 [6%]: Average Loss = 235.24
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]

And after that I get a [Errno 32] Broken pipe which is likely to signal that something when astray with multiprocessing, indeed if I look at the terminal I can see this beast:

 Process worker_chain_0:
Traceback (most recent call last):
  File "C:\Users\xxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 114, in _unpickle_step_method
    self._step_method = pickle.loads(self._step_method)
  File "C:\Users\xxx\miniconda3\envs\workshop_env\lib\site-packages\theano\compile\function_module.py", line 1082, in _constructor_Function
    f = maker.create(input_storage, trustme=True)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\compile\function_module.py", line 1715, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\vm.py", line 1091, in make_all
    impl=impl))
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\op.py", line 955, in make_thunk
    no_recycling)
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\op.py", line 858, in make_c_thunk
    output_storage=node_output_storage)
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cc.py", line 1217, in make_thunk
    keep_lock=keep_lock)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cc.py", line 1157, in __compile__
    keep_lock=keep_lock)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cc.py", line 1624, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cmodule.py", line 1189, in module_from_key
    module = lnk.compile_cmodule(location)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cc.py", line 1527, in compile_cmodule
    preargs=preargs)
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\site-packages\theano\gof\cmodule.py", line 2399, in compile_str
    (status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Reshape{0}(Subtensor{int64:int64:}.0, TensorConstant{[]}), '\n', 'Compilation failed (return status=3): ', '[Reshape{0}(<TensorType(float64, vector)>, TensorConstant{[]})]')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 135, in run
    self._unpickle_step_method()
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 116, in _unpickle_step_method
    raise ValueError(unpickle_error)
ValueError: The model could not be unpickled. This is required for sampling with more than one core and multiprocessing context spawn or forkserver.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 232, in _run_process
    _Process(*args).run()
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 145, in run
    self._wait_for_abortion()
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 151, in _wait_for_abortion
    msg = self._recv_msg()
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\site-packages\pymc3\parallel_sampling.py", line 169, in _recv_msg
    return self._msg_pipe.recv()
  File "C:\Users\xxxxx\miniconda3\envs\workshop_env\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "C:\Users\xxxxxx\miniconda3\envs\workshop_env\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    raise EOFError

I am by no stretch of imagination knowledgeable in the PyMC3 backend but it seems quite evident that the first chain failed, causing problem with one of the workers which in response lead to the multiprocessing error (which is known to be a bit awk in Windows and IPython environment specifically).

THE ACTUAL ANSWER
My gut feeling here is that the initialization generated by advi created some sort of instability leading one of the chain to fail, I know there has been problems with the choice of init before (Initialization energy is NaN or Inf with jitter). As a test, try using the default options.

Unless I am completely off-track I believe you could get some extra insight from the PyMC3 team.

1 Like