Horizontal Scaling of Pymc3

nmrobert · June 28, 2018, 8:42am

Hi,

I have a small model which I need to be able to run on many separate datasets concurrently. The model takes about 5s to compile and roughly 20s to perform ADVI, per dataset. The goal here is to find a way to engineer the model so that it can be easily scaled out to as many CPUs as required, preferably without having to use containers.

The model is bottlenecked by sequential compuations that cannot be optimized out (it’s a core part of the compute), so each ‘run’ should be limited to 1 thread. However, given as how the compilation of the model comprises a substantial portion of the total CPU time, I want to re-use a compiled graph as much as possible. This suggests the following approach:

Initialise a model and compile the graph, then use theano.shared tensors to alter the observed data between runs. It’s not clear how to ‘re-initialise’ the state of the model between pm.fit calls - should I delete and recreate the ADVI object to reinit?
Explicitly instruct theano to use at most 1 cpu, and then call separate instances of the whole python program on the same machine, up to the number of cpu cores available. This might work, but playing around suggests I will encounter some problems with the size of the tmp files for graph compilation byproducts, as well as theano locking conflicts.

Does anybody know of a good way to deal with this? There’s always the approach of simply having a large number of very small (1 cpu, low ram) container instances, each running a separate instance of the python program, and cycling through datasets. However, I’m hoping to avoid this if possible due to lack of experience with docker/kubernetes.

Thank you!

vvshankar · November 17, 2020, 3:41am

Hi,

Are you able to find a solution to this. I am also looking for something similar.

appreciate your inputs

Topic		Replies	Views
PYMC3 Multiprocessing issue on a kubernetes cloud Questions	4	1192	April 9, 2020
ADVI theano parallelization Questions	1	466	October 19, 2021
Optimization suggestion for Hierarchical Model using NUTS on CPU/GPU Questions gpu , theano	7	1352	June 4, 2018
What does the pymc3 do when starting first time v3 theano	0	386	April 15, 2022
Theano.shared in parallel loop Questions	5	779	November 24, 2018

Horizontal Scaling of Pymc3

Related topics