Issues with PyMC Execution using Snakemake: PyTensor Errors

I think that this problem happened to me on a different cluster. The issue I had to deal with was that all parallel processes were storing the compiled modules in the exact same folder: .pytensor/ at the home directory. Just to give some background, when you use the C backend, the computational graph gets optimized, then it’s transpiled into C, and finally it gets compiled into a shared object library with some python hooks to make it importable. When a pytensor process finishes, it sometimes cleans the compilation directory. Since you are running multiple jobs in parallel, you can sometimes run into concurrency issues when a process cleans the compilation directory before another process has had the chance to import the compiled module.

The solution I did, was to set a different compilation directory per process using some environment variables. The implementation will be specific to your shell and your cluster, so you’ll have to figure that one out. What I did was to write a very small bash script that set one environment variable and then called the python script that did all the work:

export PYTENSOR_FLAGS="compiledir=$HOME/.pytensor/compiledir_$(uuidgen)"
./scripts/simulation_study.py "$@"

You would have to call this bash script instead of the python script as the entry point in your cluster job setup, but after that, you shouldn’t run into these problems.