Specifying the number of chains: 'chains' vs 'njobs'

If I use

trace_beta1 = pm.sample(1000, step=step, start=start, njobs=10, tune=1000)

ten chains are indeed produced. However, if I use

trace_beta1 = pm.sample(1000, start=start, chains=10, tune=1000)

only one chain appears to be produced, e.g. afterwards trying

pm.gelman_rubin(chain_beta1)

gives ValueError: Gelman-Rubin diagnostic requires multiple chains of the same length, which implies only a single chain.

It would appear logical that chains should specify the number of chains to produce, and njobs should specify how many should be produced at one time.

What does chains actually do, and why do I seemingly have to use njobs to produce ‘parallel chains’ (is this the right term?)?

Hmm that really shouldn’t be the case, and your understanding is correct. What is your PyMC3 version?

Version 3.2. Here’s the code:

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns

with pm.Model() as our_first_model:
	theta = pm.Beta('theta', alpha=1, beta=1)
	y = pm.Bernoulli('y', p=theta, observed=data)
	start = pm.find_MAP()
	step = pm.Metropolis()
	trace = pm.sample(1000, step=step, start=start, chains=10)
	
burnin = 100
chain = trace[burnin:]
pm.gelman_rubin(chain)

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-3e954059917e> in <module>()
----> 1 pm.gelman_rubin(chain)

C:\ProgramData\Anaconda3\lib\site-packages\pymc3\diagnostics.py in gelman_rubin(mtrace,
varnames, include_transformed)
	144     if mtrace.nchains < 2:
	145         raise ValueError(
--> 146             'Gelman-Rubin diagnostic requires multiple chains '
	147             'of the same length.')
	148

ValueError: Gelman-Rubin diagnostic requires multiple chains of the same length.

I see, yeah this is recently changed, could you please update to master?
pip install git+https://github.com/pymc-devs/pymc3

Edit: Success! Though I had to use pip install --upgrade git+https://github.com/pymc-devs/pymc3

More odd behaviour. Running the Rugby model at http://docs.pymc.io/notebooks/rugby_analytics.html, I end up with four chains:

In [9]: trace
Out[9]: <MultiTrace: 4 chains, 1000 iterations, 10 variables>

I have updated from using git once again (today) and also found that tune=2000 had to be used as with tune=1000 I was getting the following:

D:\Continuum\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\nuts.py:452:
UserWarning: The acceptance probability in chain 3 does not match the target.
It is 0.0251218722568, but should be close to 0.8. Try to increase the number of tuning steps.

Edit: using trace = pm.sample(1000, tune=2000, chains=1) results in a single chain, so it looks like something has set the default number of chains to four.

Yes you are right the default number of chains is 4 now (running multiple chain is important for model diagnostic)

1 Like

I’ve submitted a PR to improve the docstring for chains. It will select the higher of njobs or 2. Most of the time you will want to sample in parallel to accomodate Gelman-Rubin diagnostic calculation. So, when you set njobs to 1 there will still be 2 chains sampled, it will just occur in serial (unless you set chains to 1 as well).

Note, however, when you ask for 1000 samples (by setting iterations=1000, you will get 1000 samples, it will just be broken out over however many chains are specified.