HPC + theano issues: Precompile logp and derivative function for sampling?

I have a model that I would like to use with ~200,000 different but identical data sets (measurements of ~200,000 different stars). I would like to run sampling on all of these objects, so to have this execute in finite time I’m trying to run this on our compute cluster over ~10 notes (each with 40 cores). The way I’ve structured my code, I construct the model and read in the data on the main process, then use MPI to send batches of data + the model to each worker process. On each worker, I then iterate through the batch of data and use pm.set_data(...) to set the few star-specfic observables before running. But this isn’t working for me because a large fraction (but not all) of the workers die when Theano tries to use subprocess.Popen to compile the model with the new data (I think):

Traceback (most recent call last):
File "/mnt/ceph/users/apricewhelan/projects/schwimmbad/schwimmbad/mpi.py", line 81, in __init__
File "/mnt/ceph/users/apricewhelan/projects/schwimmbad/schwimmbad/mpi.py", line 135, in wait
    result = func(arg)
File "run-mixmodel.py", line 40, in worker
    model = helper.get_model(**model_kw)
File "/mnt/ceph/users/apricewhelan/projects/cuddly-system/scripts/model.py", line 124, in get_model
    M = pm.Data('M', np.eye(3))
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/pymc3/data.py", line 547, in __new__
    shared_object.dshape = tuple(shared_object.shape.eval())
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/tensor/var.py", line 287, in <lambda>
    shape = property(lambda self: theano.tensor.basic.shape(self))
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/op.py", line 670, in __call__
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/op.py", line 955, in make_thunk
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/op.py", line 858, in make_c_thunk
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/cc.py", line 1217, in make_thunk
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/cc.py", line 1157, in __compile__
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/cc.py", line 1624, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/cmodule.py", line 1189, in module_from_key
    module = lnk.compile_cmodule(location)
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/cc.py", line 1527, in compile_cmodule
File "/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4Problem occurred during compilation with the command line below:
/cm/shared/sw/pkg/devel/gcc/7.4.0/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=35840 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/cm/shared/sw/pkg/devel/python3/3.7.3/lib/python3.7/site-packages/numpy/core/include -I/cm/shared/sw/pkg/devel/python3/3.7.3/include/python3.7m -I/mnt/home/apricewhelan/.local/lib/python3.7/site-packages/Theano-1.0.4+51.gf1e4ec4-py3.7.egg/theano/gof/c_code -L/cm/shared/sw/pkg/devel/python3/3.7.3/lib -fvisibility=hidden -o /mnt/home/apricewhelan/.theano/1714928/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.8.2003-Core-x86_64-3.7.3-64/tmp6miwlfs0/mf0c597995cda5a5ddd6f2023fd9404c7f21824ed4f7f0d5318dee29bd7fd2b7c.so /mnt/home/apricewhelan/.theano/1714928/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.8.2003-Core-x86_64-3.7.3-64/tmp6miwlfs0/mod.cpp -lpython3.7m
ERROR (theano.gof.cmodule): [Errno 14] Bad address: '/cm/shared/sw/pkg/devel/gcc/7.4.0/bin/g++'

It has been extremely frustrating / impossible to debug because it only happens to some workers, and for some reason seems to go away for an indeterminate amount of time if I use a fresh conda installation of python and dependencies instead of the cluster-installed versions. But after ~a few test runs, it starts to fail with the same error even in that environment. Neither me nor our HPC staff can figure out what is causing this, and I haven’t found anyone else with similar issues, so it must be something about the way our cluster is configured(?).

All of that to say: I want to find a workaround because I’m stumped on this and want to find a way to get this code to run. So, one thing I’ve been trying to figure out is whether there is a way to precompile everything on the main process before pickling and sending the model out to each worker process (it seems to be happy to compile on the main process). Is this something that would be doable with pymc3? To precompile the logp and dlogp functions and tell pm.sample() to use the precompiled versions? Or does it have to recompile each time pm.set_data() is called?

That does sound very voodoo indeed :face_with_raised_eyebrow:. Trying to precompile sounds like a good work-around. Certainly this will be a Theano thing rather than a PyMC3 thing, so I advise you to look into that direction (e.g. https://groups.google.com/forum/#!topic/theano-users/7b2vpNviKYo). It might require you to set the right flags in the PyMC3 core code, however.

Yea, I agree! But I was hoping someone who is more familiar with the pymc3 internals might have some ideas on where to start, because I think it will require some hacks within pymc3 too… From doing some spelunking, it looks like calling pm.sample() triggers a re-compile on each worker because inside, it calls model.logp_dlogp_function() which creates a new ValueGradFunction instance, which always recompiles the model. I tried pickling the ValueGradFunction and monkeypatching it onto the model instance after unpickling both the model and function instance, but even that triggers a recompile. I think that’s because pickling ValueGradFunction doesn’t properly dump the theano function (its ._theano_function attribute)? Does anyone have ideas here?