Hey,
Im currently debugging a PYMC3 based script for training a model basing on advi. The job is being run on a Kubernetes node - the pod that runs the job has 8vCPU available (request of 7500m specifically, without a limit). The problem is that when I am checking the CPU utilization graphs the actual usage never goes beyond 4 CPUs. However, in a local docker container it utilizes all available cores. Also, worth noting is that all of the virtual instances are based on Intel XEON CPUs and the node I am using has 8 virtual CPUs.
I have found a couple of places in pymc3 and theano where core detection comes into play. In pymc3 there is a parallel_processing script where CPU detection is now being done with multiprocessing.cpu_count() (previously psutil.cpu_count()). However, this one is only being used in sampling and in the experimental SMC-ABC. And we are not using sampling but variational inference (from inference module). In theano there are a couple of places and a cpuCount() function but on the
So, my question is, how does pymc3 handle multiprocessing, and how to control it? I see that in pymc3 it is possible to pass a dict argument to theano, but I haven’t found anything useful yet.
Another question would be: in case of variational inference and ADVI is it worth worrying about multiprocessing at all? Maybe the gains are so negligible that I should just forget about this and move on?
Forgive any factual errors I might have done in the area of machine learning but I am a QA Engineer trying to research a performance issue on our kubernetes setup.
Thanks in advance for any tips and pointers.
Best and stay safe,
Marcin