Nutpie sampling freezes

I tried nutpie for the first time on my rather large model. nutpie compiled the model without reporting any difficulty. But sampling resulted in a curious failure: it starts sampling, but freezes after two minutes. No new draws, no divergences, no updates to estimated time to completion. It just freezes.

Activity monitor shows all four cores still running at 100%. Memory usages climbs and climbs, starting at 5.4Gb when the freeze starts, and rising from there, above 20Gb. (Using NUTS, memory usage peaks below 1Gb.)

And then something happens. CPU usage drops from 400%—4 cores at 100% each—to 100%. The increase in memory slows down, creeping upwards instead of rushing. But no signs of sampler progress. Still frozen.

Once memory usage surpassed my 24Gb of physical memory, I stopped the madness, and interrupted the Jupyter kernel.

What is happening? To use nutpie, do I need a lot more RAM?

Details:
pymc 5.25.1
numba 0.62.1
nutpie 0.15.2
Apple Silicon M2 running arm64
MacOS 15.6.1

1 Like

Any chance you can share a reproducible example? Will be hard to figure out the problem without more details.

CC @aseyboldt anyway

The missing progress updates were a strange problem where ipython would use lots of file descriptors and run out of them when called from a rust thread (Python thread local variables are reset in callbacks from a rust thread · Issue #5467 · PyO3/pyo3 · GitHub).

It should be fixed in nutpie version 0.16.0, which is already out on pypi (the conda-forge package is currently building, but should be available very soon).

So what was happening is that the progress bar was stuck, while the sampler itself was running.

I’m not sure about the memory usage, if you have a very large number of parameters or huge deterministic variables it will just use a lot of memory to store the trace.

Version 0.16.0 also contains a new feature that should help with high memory requirements: You can now directly write the trace to a zarr file like so:

from pathlib import Path

path = Path("trace.zarr")
path.mkdir()

store = nutpie.zarr_store.LocalStore(str(path))
compiled = nutpie.compile_pymc_model(model)
tr = nutpie.sample(compiled, zarr_store=store)

This avoids having to keep the trace in memory.

Operations on the resulting trace will typically be slower with this, because data has to be loaded into memory first. You can explicitly trigger loading data into memory with something like

tr.posterior.some_variable.load()

The zarr support is still very new however. I’d appreciate it if you could let us know if you run into any problems.

3 Likes

That works. Not sure about the memory usage, but that seems to be a different issue.

How big is the trace itself? If it is much smaller than the memory usage you observed, it would be great if you could give us some more details about what you were doing, or a reproducing example.

The posterior is 8.1 Gb. (!) Is that much smaller? The peak memory usage was north of 24 Gb.

Curiously, the posterior of the NUTS sampling is only (!) 3.9 Gb.

Is it the same number of iterations and chains? Are you maybe saving warmup iterations with nutpie? I don’t know the format of the output, but if it’s something like CSV, are results being printed with the same numbers of digits?

Same number of iterations and chains. I am not explicitly saving the warmup (tuning) by setting discard_tuned_samples to False. But it is conceivable that the default behavior for nutpie.sample() is different from the default behavior for pm.sample(). (There does not seem to documentation for nutpie.sample(), perhaps because it is largely the same as pm.sample().)

The documentation for sample says:

Tuning samples will be drawn in addition to the number specified in the draws argument, and will be discarded unless discard_tuned_samples is set to False.

That is, you need to set the value to True to not save the tuning draws.

It’s probably too late for sample(), but in general, it helps to keep all the keywords positive polarity. For example, I think it would have been clearer if the keyword argument to sample() were named save_tuning_draws with doc explaining that the tuning draws will be saved only if this is set to True.

1 Like

I see your point about keeping keywords positive polarity.

But I left this keyword at its default value, discarding the tuned samples.

Aha. I now see how to read what you wrote—you’re saying you did not set discard_tuned_samples to False. I read it as you saying that you were not saving warmup because you had set discard_tuned_samples to False. I used to be a linguist who worked on semantic ambiguity like this :slight_smile:.

1 Like

That’s a good rule of thumb to keep in hand

There are docs for nutpie.sample: sampling-options – Nutpie
That hasn’t been around all that long, so I definitely can’t blame you for not knowing. :slight_smile:

Unfortunately, the behavior for store_warmup is a bid dumb in nutpie. It stores the warmup draws no matter what that argument says, and just discards the warmup draws at the very end when it converts to arviz. This isn’t all that much work to fix, but other things just stayed up higher on the todo list for a while…

The conversion to arviz also copies once (also one of those tasks on the todo list…), so the peak memory usage will be 2 times the total trace size, including warmup, and sampler stats. If you have a large trace, the zarr storage backend makes much more sense, that completely avoids having to store the trace in memory at all requirements.

1 Like