How to do parallel processing for power analysis?

chad39 · August 20, 2019, 8:18pm

Goal is similar to this thread: Bayesian sample size estimation for given HPD.

The approach is similar to the comment:

# define your model
n = theano.shared(100, dtype=int))
X = theano.shared(50, dtype=float))
with model:
    p = pm.Beta('p', alpha=2, beta=2)
    y_obs = pm.Binomial('y_obs', p=p, n=n, observed=X)

sample_size = [10, 100, 1000, 10000]
# for loop
for s in sample_size:
    n.set_value(s)
    X.set_value(np.sum(new_observed_X))
    with model:
        trace = pm.sample()
    # compute HPD.

Is there a recommended approach to parallelize the loop for increasing sample sizes? So far I have tried using an EC2 with 96 cores, but that didn’t really offer a speed up. I have also tried using python’s concurrent.futures library to parallelize the loop, but it doesn’t offer any speedup. The only other idea I have is to run each model on a separate instance (e.g. with kubernetes).

Any thoughts are greatly appreciated!

chartl · August 20, 2019, 8:26pm

does multiprocess not work? Something like

from multiprocessing import Pool 

def do_power(n):
  new_observed_X = gen_observed(n)
  with pm.Model() as mod:
    p = pm.Beta('p', alpha=2, beta=2)
    y_obs = pm.Binomial('y_obs', p=p, n=n, observed=new_observed_X)
    tr = pm.sample(chains=4,cores=1)
  return tr

pool = Pool(6) # or whatever
trace_list = pool.map(do_power, [10, 100, 1000, 10000])

chad39 · August 21, 2019, 8:10pm

@chartl I tried using concurrent.futures multiprocessing, as well as an approach similar to yours. I’m getting this output:

INFO (theano.gof.compilelock): Waiting for existing lock by process '6288' (I am process '6289')
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/me/.theano/compiledir_Darwin-17.7.0-x86_64-i386-64bit-i386-3.7.1-64/lock_dir

It seems like it is attempting to run in parallel, but theano is preventing it. The overall run time is the same as doing a for loop.

Is there another approach that you know of? The full run of the power analysis takes about 24 hours.

Topic		Replies	Views
Running models in parallel Questions	11	4501	November 4, 2021
HPC + theano issues: Precompile logp and derivative function for sampling? Questions	2	650	July 28, 2020
Theano.shared in parallel loop Questions	5	779	November 24, 2018
ConnectionResetError when using multiprocessing with more than one core on Linux Questions theano , bug	0	1275	June 9, 2021
Horizontal Scaling of Pymc3 Questions	1	629	November 17, 2020

How to do parallel processing for power analysis?

Related topics