Pymc3 getting stuck after initialization

Hello,

I ran a model last night while I went to bed and pymc3 gets stuck and just doesn’t start sampling. i’m running pymc3 version 3.6

Auto-assigning NUTS sampler...
Initializing NUTS using advi...
Average Loss = 2.3802e+08:   6%|▋         | 12606/200000 [00:06<01:28, 2117.90it/s]
Convergence achieved at 12800
Interrupted at 12,799 [6%]: Average Loss = 2.918e+08
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sd, intercept]
Sampling 4 chains:   0%|          | 0/22000 [00:00<?, ?draws/s]

Model below.

with pm.Model() as sales_model:

alpha = pm.Normal('intercept', mu=0, sd = 50)
    
s = pm.Normal('sd', mu = 0, sd = 50)

#define the likelihood
mu = alpha

y = pm.StudentT('sales', nu = (len(Y_train)-1), mu = mu, sd = s, observed = Y_train, shape = Y_train.shape)

trace = pm.sample(draws=5000, init = 'advi' ,progressbar=True)
print(sales_model.check_test_point())

Are you on winOS?

1 Like

Another important question besides the OS is if you were running the sampling on a jupyter notebook

Are there known issues with PyMC3 and jupyter? That’s how I’ve been running most of my models.

yes. Windows 10.

yes. this is in a jupyter notebook. Is that an issue?

Most likely it is a multi-processing error under windows - try setting cores=1 in pm.sample()

2 Likes

No, there is an issue with running sample with multiple cores on windows.

The problem is that windows tries to spawn new processes because it cannot fork them. Sometimes these processes raise an exception while spawning. At that stage (before they have been fully initiated) they are not able to communicate the exception to the main process, and just flush it to their stderr. On a jupyter notebook, the spawn’s stderr is the stderr of the terminal running the notebook. This means that sometimes the main process, running the notebook just seems to freeze and does not raise anything. This thread talks more about the issue.

Thank you. What do you suggest? Running in pycharm? spyder? I will read that post. Have a diaper to change right now though.

Nevermind. I ran through the length of the issue. It ended with a thought that it may be a theano problem. Have you heard any more possible solutions?

Thank you as always @junpenglao.

This allowed it to run. Is there another way to run multiprocessor though?

In general yes. The thing is that you need to be able to debug the error that happens in the spawn. Try to read through this question thread to see how to be able to read the actual exception being raised while the multiprocessing spawns are being built. We found that if you try to run your script as a batch file (save the script in your_script.py and then run it from a command prompt as python.exe your_script.py, do not run it from Spyder or pycharm as these may use an ipython kernel), you should be able to see the full traceback of your spawns errors (usually the last error that is printed to the prompt is a broken pipe error which happens in the main process and is not the real reason the script failed). Try that, see if you can find the traceback to your true error and then post it here so we can help you.

Hello @lucianopaz. I’ve read through that question and tried to add

If Name == "Main":

before my model. That did not work. I also ran the .py script through a command prompt as you suggested and this is what the error I got back.

WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
“This module will be removed in 0.20.”, DeprecationWarning)

You can find the C code in this temporary file: C:\Users\jorda\AppData\Local\Temp\theano_compilation_error_ic7zehe3
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\spawn.py”, line 105, in spawn_main
exitcode = _main(fd)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\spawn.py”, line 114, in _main
prepare(preparation_data)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\spawn.py”, line 225, in prepare
_fixup_main_from_path(data[‘init_main_from_path’])
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\spawn.py”, line 277, in _fixup_main_from_path
Traceback (most recent call last):
File “D:\Sales\sales_bayesian.py”, line 79, in
run_name=“mp_main”)
trace = pm.sample(draws=5000, init = ‘advi’, progressbar=True) File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\runpy.py”, line 263, in run_path

  File "C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\sampling.py", line 439, in sample

pkg_name=pkg_name, script_name=fname)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\runpy.py”, line 96, in _run_module_code
trace = _mp_sample(**sample_args)
mod_name, mod_spec, pkg_name, script_name) File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\sampling.py”, line 986, in _mp_sample

  File "C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\runpy.py", line 85, in _run_code

chain, progressbar) exec(code, run_globals)

File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py”, line 313, in init
File “D:\Sales\sales_bayesian.py”, line 61, in
mu_a = pm.Normal(‘mu_a’, mu = 0, sd = 10)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\distributions\distribution.py”, line 41, in new
dist = cls.dist(*args, **kwargs)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\distributions\distribution.py”, line 52, in dist
dist.init(*args, **kwargs)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\distributions\continuous.py”, line 432, in init
self.variance = 1. / self.tau
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\tensor\var.py”, line 203, in rtruediv
return theano.tensor.basic.true_div(other, self)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\op.py”, line 670, in call
no_recycling=[])for chain, seed, start in zip(range(chains), seeds, start_points)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\op.py”, line 955, in make_thunk

  File "C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py", line 313, in <listcomp>

no_recycling)
for chain, seed, start in zip(range(chains), seeds, start_points) File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\op.py”, line 858, in make_c_thunk

File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\pymc3\parallel_sampling.py”, line 204, in init
output_storage=node_output_storage)self._process.start()

File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cc.py”, line 1217, in make_thunk
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\process.py”, line 105, in start
keep_lock=keep_lock)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cc.py”, line 1157, in compile
keep_lock=keep_lock)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cc.py”, line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cmodule.py”, line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cc.py”, line 1523, in compile_cmodule
preargs=preargs)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cmodule.py”, line 2391, in compile_str
(status, compile_stderr.replace(’\n’, '. ')))
Exception: ('Compilation failed (return status=1): C:\Users\jorda\AppData\Local\Theano\compiledir_Windows-10-10.0.17134-SP0-Intel64_Family_6_Model_158_Stepping_10_GenuineIntel-3.6.5-64\tmpnqc__ki2\mod.cpp:1:0: sorry, unimplemented: 64-bit mode not compiled in\r. #include <Python.h>\r. ^\r. ', ‘[Elemwise{true_div,no_inplace}(TensorConstant{1.0}, TensorConstant{0.01})]’)
self._popen = self._Popen(self)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\context.py”, line 322, in _Popen
return Popen(process_obj)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\popen_spawn_win32.py”, line 65, in init
reduction.dump(process_obj, to_child)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\multiprocessing\reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Great! Now you’ve got the cause of the error. The main part of the traceback is

File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cc.py”, line 1523, in compile_cmodule
preargs=preargs)
File “C:\Users\jorda\AppData\Local\conda\conda\envs\theano\lib\site-packages\theano\gof\cmodule.py”, line 2391, in compile_str
(status, compile_stderr.replace(’\n’, '. ')))
Exception: ('Compilation failed (return status=1): C:\Users\jorda\AppData\Local\Theano\compiledir_Windows-10-10.0.17134-SP0-Intel64_Family_6_Model_158_Stepping_10_GenuineIntel-3.6.5-64\tmpnqc__ki2\mod.cpp:1:0: sorry, unimplemented: 64-bit mode not compiled in\r. #include <Python.h>\r. ^\r. ', ‘[Elemwise{true_div,no_inplace}(TensorConstant{1.0}, TensorConstant{0.01})]’)

It looks really strange. It seems to complain about trying to compile using 64bit precision on a system that only has 32bit available. I’m going to assume that you used conda to install everything, right? Do you remember what command you ran? How many bits is your os working with? What versions of theano, numpy and pymc3 are you using? Have changed any environment variables through conda before running your script?

Another thing, do you have more than one python installation? And did you ever install the python development files (i.e. the python.h and libs to be able to compile c code)?

pymc3: 3.5
theano: 1.0.2
numpy: 1.14.3
OS: WIndows 10 64-bit

For pymc3, I did

conda install -c conda-forge pymc3

For the others, I would have just followed whatever documentation was online.

I don’t think I’ve changed any environment variables.

I have theano on it’s own environment and that’s the environment pymc3 runs on. No development files though.

The weird thing is that I’ve ran multi-core models before. All of the sudden, it just stopped and I’m not sure why. I only use this computer for data science work.

Thanks for your willingness to help thus far!

Hi @lucianopaz. If this is a windows issue, would it work to set up an unbuntu on a virtual environment and run from there? Is that possible. Never worked with unbuntu before so I’m not even sure if I’ll be able to see graphics on it or if it’s just command line.

@jordan.howell2, sorry that I couldn’t look into this problem more in depth yet. The problem is caused by the spawn multiprocessing method, which is window’s default. Unix like systems can use the fork multiprocessing method by default, and are not affected by all of this weirdness. It is very likely that you would be able to run without problems on a Unix (for example Ubuntu) virtual environment. I think that you can do this with a docker image, but I don’t really have experience in that.

Going back to your issue, I think it is a problem that lies deeper than pymc3. Recently, an issue was open on theano regarding similar compile errors when the multiprocessing spawn tried to unpickle the process object. My wild guess, which I will explore eventually, is that during the spawn’s initialization, the environment variables don’t match the main process’ environment leading to different libraries, sources and compilers to be fetched by theano, which then breaks.

Thanks for the reply. So does this mean I’m doomed from using multiprocessors in pymc3? Does pystan do multiprocessing? Although I don’t to learn stan :frowning:

Most definitely not. I can use multiprocessing on my windows 10 64bit installation. I just haven’t been able to pin down what is causing this strange compile error.

You can also try to install some compilers and libraries that would ensure you would be able to compile the nodes.

If you uninstall theano and pymc3, and then run

conda install mkl-service libpython m2w64-toolchain numpy scipy
conda install theano pygpu
conda install pymc3

You should get theano to detect the c compilers, blas libraries and python header files correctly. This maybe will solve your particular error, which looked like a senseless compilation error due to an unimplemented feature in Python.h.

2 Likes

Hello. I did as yo suggested and no luck. There is a new error though.

You can find the C code in this temporary file: C:\Users\jorda\AppData\Local\Temp\theano_compilation_error_66tsc7s6
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\jorda\Anaconda3\lib\multiprocessing\spawn.py”, line 105, in spawn_main
exitcode = _main(fd)
File “C:\Users\jorda\Anaconda3\lib\multiprocessing\spawn.py”, line 115, in _main
self = reduction.pickle.load(from_parent)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\compile\function_module.py”, line 1082, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\compile\function_module.py”, line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\link.py”, line 699, in make_thunk
storage_map=storage_map)[:3]
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\vm.py”, line 1091, in make_all
impl=impl))
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\op.py”, line 955, in make_thunk
no_recycling)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\op.py”, line 858, in make_c_thunk
output_storage=node_output_storage)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1217, in make_thunk
keep_lock=keep_lock)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1157, in compile
keep_lock=keep_lock)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1620, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cmodule.py”, line 1181, in module_from_key
module = lnk.compile_cmodule(location)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cc.py”, line 1523, in compile_cmodule
preargs=preargs)
File “C:\Users\jorda\Anaconda3\lib\site-packages\theano\gof\cmodule.py”, line 2391, in compile_str
(status, compile_stderr.replace(’\n’, '. ')))
Exception: (‘The following error happened while compiling the node’, Shape_i{0}(__args_joined), ‘\n’, 'Compilation failed (return status=1): C:\Users\jorda\AppData\Local\Theano\compiledir_Windows-10-10.0.17134-SP0-Intel64_Family_6_Model_158_Stepping_10_GenuineIntel-3.7.1-64\tmpm_8a4j1k\mod.cpp:1:0: sorry, unimplemented: 64-bit mode not compiled in\r. #include <Python.h>\r. ^\r. ', ‘[Shape_i{0}(__args_joined)]’)

I’m not sure if this matters but when I typed

conda install mkl-service libpython m2w64-toolchain numpy scipy

it stated the requirements were already met.

@jordan.howell2, thanks for trying it out. The exception in the end is almost the same:

sorry, unimplemented: 64-bit mode not compiled in\r. #include <Python.h>

Damn hard to root out why this is happening, but it seems to be a theano problem. I’ll link this discourse thread to a similar issue submitted to theano to ask for suggestions.