How to run PyMC3 in a multi-node cluster? Is it possible at the moment?

Hey, I am trying to do something similar – Segmentation fault (core dumped) on running pm.sample in Ubuntu.
What is the most efficient way to do this as of now (especially for large data)? I have access to a HPC environment but not sure if I’m using it to its fullest with PyMC3. When I run a pymc3 program on 6 nodes, only one of the nodes is full at 100% cpu and the other nodes are more or less idle as checked with the seff command. Am I not doing it right or is this the expected behaviour?
From this discussion, here’s my take away:

  1. Right now, for PyMC3, there is no support for computing on multiple nodes (though there is scope to do so) so increasing number of nodes won’t help as of now.
  2. Increasing number of cores does help/speed up the sampling process.
  3. Best solution for now: Use only a single node but try to get max. number of cores on it.

Can someone confirm if my take away is right?
Is there a better “best solution for now”?
My model is a probabilistic DAG but it also follows time series and I want to do bayesian inference with it. Any alternatives to PyMC3 that might be faster for my use case? How about Stan/PyStan, is it faster?

Thanks in advance for any help! :slight_smile: