Sampling in parallel when a model reads outputs from the file

Hi @elchin, I’ll reply here so others can follow the thread.

Assuming that this is a file access problem. If it’s not, please post how the function above ends up in your PyMC model - you’ll need a custom Op if you’re not already doing that.

It sounds like the different processes are colliding the access to that file?
The simplest solution I can think of would be to fetch the thread ID and include it in the file name. (Process IDs may be identical between parent and child on some operating systems if I remember correctly.)

Check this threading — Thread-based parallelism — Python 3.10.4 documentation

Note that the Markov chains must be independent of each other, so if you’re trying to access the same file from all chains: stop it. Best case you’d get terrible performance (file locking) and worst case you could even violate the detailed balance.

Hope this helps, cheers :slight_smile:

2 Likes