Does anyone know of any guides or docs for parallel sampling in current version of PYMC (mp_ctx)

Yes, cores is only related to the number of chains. MCMC is a sequential algorithm, so after the chains are parallelized, there’s no further gains that can be realized via parallelism for MCMC. That said, within each MCMC step there can be, and often is, additional multiprocessing that goes on, but that happens independently of the cores argument. See here for relevant discussions.

1 Like