# define your model
n = theano.shared(100, dtype=int))
X = theano.shared(50, dtype=float))
with model:
p = pm.Beta('p', alpha=2, beta=2)
y_obs = pm.Binomial('y_obs', p=p, n=n, observed=X)
sample_size = [10, 100, 1000, 10000]
# for loop
for s in sample_size:
n.set_value(s)
X.set_value(np.sum(new_observed_X))
with model:
trace = pm.sample()
# compute HPD.
Is there a recommended approach to parallelize the loop for increasing sample sizes? So far I have tried using an EC2 with 96 cores, but that didn’t really offer a speed up. I have also tried using python’s concurrent.futures library to parallelize the loop, but it doesn’t offer any speedup. The only other idea I have is to run each model on a separate instance (e.g. with kubernetes).
@chartl I tried using concurrent.futures multiprocessing, as well as an approach similar to yours. I’m getting this output:
INFO (theano.gof.compilelock): Waiting for existing lock by process '6288' (I am process '6289')
INFO (theano.gof.compilelock): To manually release the lock, delete /Users/me/.theano/compiledir_Darwin-17.7.0-x86_64-i386-64bit-i386-3.7.1-64/lock_dir
It seems like it is attempting to run in parallel, but theano is preventing it. The overall run time is the same as doing a for loop.
Is there another approach that you know of? The full run of the power analysis takes about 24 hours.