Pm.sample does not work with cores > 1 in a Docker container

Hi,

The title is clear. On my local machine it works fine, but for the pm.sample() to work in the Docker container I need to pass the cores = 1 argument, any other value and it gets stuck.

I tried the solution proposed in this discussion: Pm.sample gets stuck after init with cores > 1 - Questions - PyMC Discourse. But then I just get a different error, I guess this was to fix some issue with Jupyter Notebook?

Here’s the stack trace if I include the mp.set_start_method('forkserver') as proposed in that discussion:

Multiprocess sampling (4 chains in 2 jobs)
NUTS: [constant, beta, noisy_score]
^C-------------------------------------------------------------------------------------------------| 0.00% [0/8000 00:00<00:00 Sampling 4 chains, 0 divergences]
^CTraceback (most recent call last):
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 133, in <module>
    fit_object.fit_model()
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 91, in fit_model
    idata = pm.sample(step=step, chains=4, cores=2)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 702, in sample
    return _sample_return(
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 733, in _sample_return
    traces, length = _choose_chains(traces, tune)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/backends/base.py", line 601, in _choose_chains
    raise ValueError("Not enough samples to build a trace.")
ValueError: Not enough samples to build a trace.
(myenv) root@f1f96055bf47:/app/commandline-classifier/implementation# nano BayesianLogisticRegressionModelFit.py
(myenv) root@f1f96055bf47:/app/commandline-classifier/implementation# python3 BayesianLogisticRegressionModelFit.py guin0x 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
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [constant, beta, noisy_score]
Traceback (most recent call last):
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/opt/conda/envs/myenv/lib/python3.10/runpy.py", line 269, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/opt/conda/envs/myenv/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/opt/conda/envs/myenv/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 7, in <module>
    mp.set_start_method('forkserver')
  File "/opt/conda/envs/myenv/lib/python3.10/multiprocessing/context.py", line 243, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set
Traceback (most recent call last):
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 134, in <module>
    fit_object.fit_model()
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 92, in fit_model
    idata = pm.sample(step=step, chains=4, cores=2)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 677, in sample
    _mp_sample(**sample_args, **parallel_args)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 1052, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 402, in __init__
    self._samplers = [
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 403, in <listcomp>
    ProcessAdapter(
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 259, in __init__
BrokenPipeError: [Errno 32] Broken pipe

And here’s the stack trace when I just cancel the sampling because it was stuck (without the mp part):

Multiprocess sampling (4 chains in 2 jobs)
NUTS: [constant, beta, noisy_score]
^C^CTraceback (most recent call last):-------------------------------------------------------------| 0.00% [0/8000 00:00<00:00 Sampling 4 chains, 0 divergences]
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 134, in <module>
    fit_object.fit_model()                                                                                      
  File "/app/commandline-classifier/implementation/BayesianLogisticRegressionModelFit.py", line 92, in fit_model
    idata = pm.sample(step=step, chains=4, cores=2)                                                             
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 702, in sample          
    return _sample_return(                                                                                      
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/sampling/mcmc.py", line 733, in _sample_return  
    traces, length = _choose_chains(traces, tune)                                                               
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/pymc/backends/base.py", line 601, in _choose_chains  
    raise ValueError("Not enough samples to build a trace.")                                                    
ValueError: Not enough samples to build a trace. 

Thanks in advance!

Is it possible to share the code? And what platform are you running on?

Hi,

the code is a simple logistic regression, nothing much to it:

 with pm.Model(coords={"predictors": self.X.columns.values}) as model:
            X = pm.MutableData('X', self.X)
            y = pm.MutableData('y', self.y)

            constant = pm.Normal('constant', mu=-0.5, sigma=0.1)
            beta = pm.Normal('beta', mu=0, sigma=1, dims="predictors")
            score = pm.Deterministic('score', X @ beta)
            noisy_score = pm.Normal('noisy_score', mu=score, sigma=5)
            p = pm.Deterministic('p', pm.math.sigmoid(constant + noisy_score))

            # define likelihood
            observed = pm.Bernoulli('obs', p, observed=y)

            step = pm.NUTS()
            idata = pm.sample(step=step, cores=1)
            idata_prior = pm.sample_prior_predictive(samples=50)

I am running this code inside a Docker container. The Docker container runs some debian distro FROM continuumio/miniconda3.

I am running the Docker on Windows.

Thanks once again!

For these sorts of issues, it might help to see the entire script rather than just the model code. In particular, multi-threading on windows often requires a standard if __name__ == '__main__': statement. Not sure how that interacts with docker environments, etc. but it might help someone to spot the critical pieces.