How to run PyMC3 in a multi-node cluster? Is it possible at the moment?

Hi all!

Firstly, thanks a lot for developing and contribute to PyMC3!

Straight to the point: I have been wondering if it would be possible to run PyMC3 in HPC environments, like super-computers with several nodes. Anyone with experiences to share? It would be of great help to me.

The thing is: I have a really hard and computationally intensive problem. I solve differential equations with a lot of parameters, all fancy stuff like that, etc, etc. Imagine now how long each realization takes… well, so far SMC has been saving me and my group since it runs nicely in parallel (in a single node machine). But I did it successfully only in a single machine with a lot of cores. However, we have a large super-computer available for us, and initial tests have failed.

I did some investigation and found that, behind SMC sampling, multiprocessing is being used. This is a pretty nice lib, but it can’t handle multi-nodes machine AFAIK. However, there is hope: ray and its multiprocessing implementation. With a single line change in the code (more precisely, in the import), multiprocessing can be used in a cluster by providing a proper configuration file (which is a user responsibility). What do you PyMC3-devs think about it? Would it be worth?

I have been thinking about trying to do some contributions to PyMC3. In case you like the above idea, it would be a pleasure to open a PR from my side :slight_smile:

Cheers!

I’m sure that any robust PR implementing new features would be very welcome.

I just wanted to check that you were aware of PyMC4? One of the aims seems to be better GPU usage via tensorflow. I imagine that a biproduct of this would be better multi-node usage as well. I mention it in case it is helpful .

1 Like

I am not familiar with ray - SMC need access to all chain/batch at the end of each step for resampling, not sure whether that would work with ray. If you find it useful, we certainly welcome a PR (we can set is as a flag).
Did you try to see if your solution works?
Otherwise, as @sammosummo mentioned, you can look at pymc4/TFP, which I recently added SMC as well https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/experimental/mcmc/examples (there is even an ODE example)

other effort of speeding up ODE including @aseyboldt’s work, which you might also find helpful https://github.com/aseyboldt/sunode

2 Likes

Thanks for your reply, @sammosummo!

I’m sure that any robust PR implementing new features would be very welcome.

Great! I’m working on some tests.

I just wanted to check that you were aware of PyMC4? One of the aims seems to be better GPU usage via tensorflow. I imagine that a biproduct of this would be better multi-node usage as well. I mention it in case it is helpful .

Ah, yes! PyMC4 looks pretty interesting, I checked it out. However, since this is a very critical research (about COVID-19), PyMC4 didn’t sound appropriate since it points itself as an early-stage project. So our decision was to use a consolidated tool (like PyMC3) instead of a new and under development one. But we want to test PyMC4 in the future because of its GPU capabilities, as you mentioned. Thanks!

Hi, @junpenglao! Thanks for your reply!

I am not familiar with ray - SMC need access to all chain/batch at the end of each step for resampling, not sure whether that would work with ray. If you find it useful, we certainly welcome a PR (we can set is as a flag).

I think that ray will do alright. If multiprocessing does the right thing, ray will do it as well. The only difference is that ray has an implementation of multiprocessing that can be aware of multiple nodes, while multiprocessing can’t detect or set workers from multiple nodes, at least it’s what I understand.

Did you try to see if your solution works?

I’m working on it at this moment! If everything looks good, I’ll submit a PR for you guys to check it out.

Otherwise, as @sammosummo mentioned, you can look at pymc4/TFP, which I recently added SMC as well https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/experimental/mcmc/examples (there is even an ODE example)

Great work, well done! However, there is no clear API at this moment and generate results with it while working with colleagues that don’t have previous experiences with the tool would require a time we don’t have, unfortunately. But, as I mentioned before, I have interest to learn in the future. I didn’t know that SMC was implemented in TFP by you. When we began the project, you haven’t submitted the PR yet. But I’m now watching the repo!

other effort of speeding up ODE including @aseyboldt’s work, which you might also find helpful https://github.com/aseyboldt/sunode

Looks amazing! I’ll have a look on it, for sure! Thanks!! At this moment, we are using LSODA solver from scipy, which wraps ODEPACK implementation.

Thanks again!

You are welcome! If the speed bottleneck persist and you are interested to explore solution in PyMC4/TFP, feel free to reach out and I can (or I will find someone who can ;-)) help you porting the model and inference. FWIW, the implementation of TFP SMC is very similar to PyMC3, with additional flexibility to use HMC as internal mutation which should make it scale much better to more dimensions. Also, if you are working on Covid and need to fit many time series at the same time, TFP generally gives better support for multi-batch which means you can fit multiple copies of the same model at once.

Amazing, @junpenglao! Right now we are stuck documenting current results in a paper. But as soon as I finish this part, I will try to contact you. Actually multi-batch would be very useful since we analyze and simulate for several locations. TFP looks pretty exciting, I have to investigate it further.

Out of curiosity: I didn’t compare your SMC with the one inside PyMC3. In the PyMC3, looks like an implementation of Cascading Adaptive Metropolis in Parallel (CATMIP), if I understood it right. Your implementation in TFP follows the one in PyMC3 or it’s another method? Since you commented about the mutation (there is a similar mechanism in CATMIP, if I remember well), is it different when compared to PyMC3?

Thanks!

Yep we have adaptive tuning very similar to the one in PyMC3, but with the flexibility that you can do the same for HMC. We are planning to add more tuning like the one from https://arxiv.org/pdf/1808.07730.pdf

1 Like