Backends don't work with simple setup when sample sequentially


#1

I am trying to test out different backends for pymc3 to deal with an extremely large trace file that we have. Please before you ask me to shrink the model, understand that we are dealing with a model with over 2000 values for just one of the parameters, so its just going to be large. That being said none of the standard backends seem to work for pymc3==3.3:

I was going to create an issue in Github, but I wanted to see if I have something wrong in this setup. Any help would be greatly appreciated:

17:20 $ python
Python 3.6.1 (default, Nov  8 2017, 14:29:33)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymc3 as pm
>>> model = pm.Model()
>>> with model:
...     a = pm.Normal('a', mu=0, sd=1)
...     trace = pm.sample(1000, n_init=1000, cores=1, njobs=1)
...
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [a]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:01<00:00, 1445.02it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 2644.67it/s]

Text Backend Fails:

17:11 $ python
Python 3.6.1 (default, Nov  8 2017, 14:29:33)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymc3 as pm
>>> model = pm.Model()
>>> with model:
...     a = pm.Normal('a', mu=0, sd=1)
...     db_text = pm.backends.Text("text-test-42")
...     trace = pm.sample(1000, n_init=1000, trace=db_text, cores=1, njobs=1)
...
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [a]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 2404.39it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 2485.21it/s]
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 439, in sample
    trace = _sample_many(**sample_args)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 494, in _sample_many
    return MultiTrace(traces)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/backends/base.py", line 265, in __init__
    raise ValueError("Chains are not unique.")
ValueError: Chains are not unique.

SQLite Fails:

17:16 $ python
Python 3.6.1 (default, Nov  8 2017, 14:29:33)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymc3 as pm
>>> model = pm.Model()
>>> with model:
...     a = pm.Normal('a', mu=0, sd=1)
...     db_sqllite = pm.backends.SQLite("test-sqllite")
...     trace = pm.sample(1000, n_init=1000, trace=db_sqllite, cores=1, njobs=1)
...
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [a]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 2328.49it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:00<00:00, 2453.27it/s]
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 439, in sample
    trace = _sample_many(**sample_args)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 494, in _sample_many
    return MultiTrace(traces)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/backends/base.py", line 265, in __init__
    raise ValueError("Chains are not unique.")
ValueError: Chains are not unique.

HDF5 Fails:

16:55 $ python
Python 3.6.1 (default, Nov  8 2017, 14:29:33)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymc3 as pm
>>> model = pm.Model()
>>> with model:
...     a = pm.Normal('a', mu=0, sd=1)
...     db = pm.backends.HDF5('test-hdf5-3')
...     trace = pm.sample(1000, n_init=1000, trace=db, cores=1, njobs=1)
...
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [a]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1500/1500 [00:07<00:00, 193.66it/s]
  0%|                                                                                                                                                     | 0/1500 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 439, in sample
    trace = _sample_many(**sample_args)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 482, in _sample_many
    step=step, random_seed=random_seed[i], **kwargs)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 526, in _sample
    for it, strace in enumerate(sampling):
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/tqdm/_tqdm.py", line 862, in __iter__
    for obj in iterable:
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/sampling.py", line 614, in _iter_sample
    strace.setup(draws, chain, step.stats_dtypes)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/backends/hdf5.py", line 154, in setup
    self._set_sampler_vars(sampler_vars)
  File "/Users/orion.delwaterman/.pyenv/versions/3.6.1/envs/hiring-horizons/lib/python3.6/site-packages/pymc3/backends/base.py", line 80, in _set_sampler_vars
    raise ValueError("Can't change sampler_vars")
ValueError: Can't change sampler_vars
>>> model.unobserved_RVs
[a]

#2

Yep, I can confirm it is a bug - seems to related to when sampling multiple chains sequentially. Could you please open an issue?

Also, small tips:

  • n_init=1000 is not doing anything here - it is related to another initialization setup. You should instead use tune=1000 for tunning
  • you dont need both cores and njobs. If you are on master use cores. And if you want to sample from one chain only set chains=1