Does PyMC implicitly hold things in memory? Problem with memory growing without bound

I have a function, let’s call it model_build(), that does a bunch of stuff, like building a model, sampling to get the posterior, and then doing sample_ppc(). It saves stuff to file but doesn’t return anything. It’s able to run fine for me, and with the sample sizes/etc I’m using, takes ~1.5 GB of memory to run.

However, what I’m doing is calling it in a for loop, with different sets of parameters. What I’m finding is that the memory is growing without bound, until by about the 4th or 5th iteration, it eats up the 8GB of memory my laptop has, and crashes, giving a memory error.

This is confusing to me, because model_build() doesn’t return anything – therefore, I would intuitively guess that anything that happens internally releases all the memory used when the iteration is done, so it doesn’t have to hold data/info/traces/etc from previous iterations.

But clearly, something is persistent, which makes me think it’s something PyMC3 is doing… anyone have any guesses? thanks!

pymc shouldn’t keep anything in memory itself. Theano isn’t always so nice however (it also caches a lot of things to make compilation faster), and it also has quite some refcount circles. You can try adding a import gc; gc.collect() after the models are destroyed, but that might only help with some things unfortunately…

Hi, thanks a ton for the response. Since making the post, I did a bit more effective searching and found a few helpful posts that had the same problem:

Like you said, some people there suspect that it could be a theano thing. One guy also mentions trying to use del and gc.collect(), but had no success.

One successful workaround seemed to be using multiprocessing to create a whole separate process, which probably has better ability to release used memory because it’s a more “separate” process than just another stack function call? I’m not super familiar with internals though.

I wasn’t explicitly destroying my models though (with del or something), I just assumed they’d be released when the function that created them finished. Did you mean that I should destroy them some other way?

thanks for a great package!

1 Like

Oh, one more thing: I just found this thread: https://github.com/pymc-devs/pymc3/issues/2646

one of the devs mentioned that he thought this may have been addressed. However, I have PyMC3 3.6, python 3.6.8, Theano 1.0.4, Ubuntu 18.04. I guess there’s PyMC3 3.7 in pip, but I suspect that’s not the problem since that post was a while ago.

No, you don’t need to explicitly delete models with del, that won’t make any difference. Wrapping the fit with multiprocessing should completely solve the issue, because it really is a separate python interpreter with a different address space. So something like this should do the trick (untested):

import concurrent.futures

def run_model(*args, **kwargs):
    with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
        future = executor.submit(inner_run_model, *args, **kwargs)
        return future.result()

I just feel somewhat bad suggesting a work-around like this, but since I think it is theanos fault…

1 Like

Hey, thanks, that worked great. It’s obviously not ideal, but it was an easy workaround for now.

minor typo for future users who find this:

future = executor.submit(inner_run_model, *args, **kwargs)

thanks again!