Hi,
I am able to control the number of cores with:
os.environ['MKL_NUM_THREADS'] = '4'
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['GOTO_NUM_THREADS'] = '4'
Nevertheless, I am experiencing some strange behaviors. I have a machine with 56 cores and my model takes 52h to run with all cores used. With 8 cores it takes ~22h, with 4 cores ~18h. I was expecting to see a reduction of the wall time with more cores used. This seems strange. Any recommendation?