I have a similar case that is shown in the example here. The only difference if that my linear model reads results from the file
def my_model(theta, x):
m, c = theta
y =m * x + c
np.savetxt('output.out', y, delimiter=',')
z=np.loadtxt('output.out', delimiter=',', unpack=True)
return z
so when I sample it in parallel
idata_mh = pm.sample(3000, tune=1000, cores=2)
it is giving me an error
RuntimeError: Chain 0 failed.
I am not sure how to handle parallelization when I need to read model outputs from the same file. I guess, similar to the core number, the number of folders need to be created to handle this process in parallel.
Hi @elchin, I’ll reply here so others can follow the thread.
Assuming that this is a file access problem. If it’s not, please post how the function above ends up in your PyMC model - you’ll need a custom Op if you’re not already doing that.
It sounds like the different processes are colliding the access to that file?
The simplest solution I can think of would be to fetch the thread ID and include it in the file name. (Process IDs may be identical between parent and child on some operating systems if I remember correctly.)
Note that the Markov chains must be independent of each other, so if you’re trying to access the same file from all chains: stop it. Best case you’d get terrible performance (file locking) and worst case you could even violate the detailed balance.
I was thinking about generating a random number and creating a folder corresponding to that random number. Then each process would run in its designated folder. (I just need to make sure that random numbers are unique).
Hi @michaelosthege, thanks for your previous reply. I met the similar question so I check this thread.
Can you elaborate how to get the chain id in the Pymc and pass this id number to my own likelihood function? My likelihood function has multiple input and output files labeled with the numbers, so now I only need to pass the chain number to the likelihood, so I can run pymc sampling in parallel. Otherwise, I can only run it sequentially, which is slow.
Subprocesses/chains do not know their chain number. See pm.sampling.parallel.py where the object receiving parameters from the ProcessAdapter on the main process through a pipe does not receive chain numbers.
Only with the PopulationStepper this information ends up in the child process.
However, all steppers are instantiated once in the main process and duplicated for the chains. The step method API does not specify that chain numbers are ever relevant for a stepper, and chain numbers are never passed to a logp function either.
Maybe if you describe what you’re trying to achieve, and why you want to use chain numbers we can can help you to find less hackish solutions?