Using text backend raises sampler stats error

When I use the text backend, I get an error about it not supporting sampler stats:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import pymc3 as pm
from pymc3.backends import Text

import os

TRACE_DIR = os.path.join('your_path_here', 'data', 'test_trace')

%matplotlib inline

model = pm.Model()
with model:
    mu1 = pm.Normal("mu1", mu=0, sd=1, shape=2)

with model:
    db = Text(TRACE_DIR)
    trace = pm.sample(2000, tune=1000, init=None, trace=db)

generates the error below. Is there a simple way around this that I’m just not aware of?
Thanks.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d7ac78a84d78> in <module>()
      1 with model:
      2     db = Text(TRACE_DIR)
----> 3     trace = pm.sample(2000, tune=1000, init=None, trace=db)

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/pymc3/sampling.pyc in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, **kwargs)
    274     discard = tune if discard_tuned_samples else 0
    275 
--> 276     return sample_func(**sample_args)[discard:]
    277 
    278 

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/pymc3/sampling.pyc in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, live_plot_kwargs, **kwargs)
    289     try:
    290         strace = None
--> 291         for it, strace in enumerate(sampling):
    292             if live_plot:
    293                 if live_plot_kwargs is None:

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/tqdm/_tqdm.pyc in __iter__(self)
    860 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    861 
--> 862             for obj in iterable:
    863                 yield obj
    864                 # Update and print the progressbar.

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/pymc3/sampling.pyc in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    407         strace.close()
    408         if hasattr(step, 'report'):
--> 409             step.report._finalize(strace)
    410 
    411 

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/pymc3/step_methods/hmc/nuts.pyc in _finalize(self, strace)
    460         self._chain_id = strace.chain
    461 
--> 462         tuning = strace.get_sampler_stats('tune')
    463         if tuning.ndim == 2:
    464             tuning = np.any(tuning, axis=-1)

/Users/weitzenfeld/Envs/selfmade/lib/python2.7/site-packages/pymc3/backends/base.pyc in get_sampler_stats(self, varname, sampler_idx, burn, thin)
    158         """
    159         if not self.supports_sampler_stats:
--> 160             raise ValueError("This backend does not support sampler stats")
    161 
    162         if sampler_idx is not None:

ValueError: This backend does not support sampler stats

Yep, text backend does not support sampler stats. You can try pickle the trace

It seems like the NUTS sampler automatically calls .get_sampler_stats, so I think this means it’s impossible to use the text backend with NUTS.

That is a legit bug, then – I can try to fix that.

Out of curiosity, is there a reason you’re using the Text backend? It might be time to deprecate these, since I’m not sure of a use case that is not better covered by the standard NDAarray, if we add a .save and .load method to it.

I think the conversation was to add a deprecation warning and continue supporting backends through a 3.2 release, and then get rid of them for 3.3.

I’m using the text backend because I’m running a largish model in a low-memory environment - an EC2 instance with only 3GB of RAM.
I’m not sure what the implications are of adding .save and .load methods to NDAarray are, but as long as there’s a way to run inference without storing the traces in memory, I’m happy.

Thanks! I’ll keep that in mind.

Back of the envelope, 3GB is ~750M float32's, but I suppose a 1M samples of 100 variables isn’t ridiculous, along with some other overhead.

Frankly, I’m impressed you could install numpy/scipy on a machine of that size – I guess python wheels help?

Your back of the envelope math got me curious, so I went back and looked at my logs, and it looks like I had used text from the get-go - I was assuming that I would have memory issues if I didn’t - and that I was hitting memory errors here:

    ris = trace.get_values('ris_level_offset', burn=1000, thin=2)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pymc3/backends/base.py", line 245, in get_values
    for chain in chains]
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pymc3/backends/text.py", line 125, in get_values
    self._load_df()
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pymc3/backends/text.py", line 104, in _load_df
    self.df = pd.read_csv(self.filename)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 957, in read
    df = DataFrame(col_dict, columns=columns, index=index)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/frame.py", line 5408, in _arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4262, in create_block_manager_from_arrays
    blocks = form_blocks(arrays, names, axes)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4331, in form_blocks
    float_blocks = _multi_blockify(float_items)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4408, in _multi_blockify
    values, placement = _stack_arrays(list(tup_block), dtype)
  File "/home/ubuntu/.virtualenvs/selfmade/local/lib/python2.7/site-packages/pandas/core/internals.py", line 4451, in _stack_arrays
    stacked = np.empty(shape, dtype=dtype)
MemoryError

I can try without using the text backend to see if I actually do get memory errors or if I was barking up the wrong tree.

As for numpy/scipy - I’ve installed it on even smaller instance types. This trick has come in handy.

You could also use the hdf5 backend, that should support sampler stats, too. The backends aren’t tested particularly well however (as you have seen yourself :frowning: )

It looks like you will get a memory error when you load the trace anyways, right? Something like this might require custom code on your part - it seems like the text backend is just storing things as a csv, so probably grabbing the file and using something like the standard library’s csv.DictReader will keep memory at O(1).

I have also heard that passing column data types to pd.read_csv makes it much more efficient, but don’t know enough about pandas internals to know whether that is memory or speed efficiency.

In any case, working on at least fixing the error now!

This will at least get rid of the error – I spent an hour trying to implement sampler stats, but it was a little bit non-trivial, and I would rather not spend too much time on a feature that could easily disappear in a few months.

https://github.com/pymc-devs/pymc3/pull/2431

1 Like

awesome, thank you Colin!