I have a function, let’s call it model_build(), that does a bunch of stuff, like building a model, sampling to get the posterior, and then doing sample_ppc(). It saves stuff to file but doesn’t return anything. It’s able to run fine for me, and with the sample sizes/etc I’m using, takes ~1.5 GB of memory to run.
However, what I’m doing is calling it in a for loop, with different sets of parameters. What I’m finding is that the memory is growing without bound, until by about the 4th or 5th iteration, it eats up the 8GB of memory my laptop has, and crashes, giving a memory error.
This is confusing to me, because model_build() doesn’t return anything – therefore, I would intuitively guess that anything that happens internally releases all the memory used when the iteration is done, so it doesn’t have to hold data/info/traces/etc from previous iterations.
But clearly, something is persistent, which makes me think it’s something PyMC3 is doing… anyone have any guesses? thanks!
pymc shouldn’t keep anything in memory itself. Theano isn’t always so nice however (it also caches a lot of things to make compilation faster), and it also has quite some refcount circles. You can try adding a import gc; gc.collect() after the models are destroyed, but that might only help with some things unfortunately…
Hi, thanks a ton for the response. Since making the post, I did a bit more effective searching and found a few helpful posts that had the same problem:
Like you said, some people there suspect that it could be a theano thing. One guy also mentions trying to use del and gc.collect(), but had no success.
One successful workaround seemed to be using multiprocessing to create a whole separate process, which probably has better ability to release used memory because it’s a more “separate” process than just another stack function call? I’m not super familiar with internals though.
I wasn’t explicitly destroying my models though (with del or something), I just assumed they’d be released when the function that created them finished. Did you mean that I should destroy them some other way?
one of the devs mentioned that he thought this may have been addressed. However, I have PyMC3 3.6, python 3.6.8, Theano 1.0.4, Ubuntu 18.04. I guess there’s PyMC3 3.7 in pip, but I suspect that’s not the problem since that post was a while ago.
No, you don’t need to explicitly delete models with del, that won’t make any difference. Wrapping the fit with multiprocessing should completely solve the issue, because it really is a separate python interpreter with a different address space. So something like this should do the trick (untested):
def run_model(*args, **kwargs):
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
future = executor.submit(inner_run_model, *args, **kwargs)
I just feel somewhat bad suggesting a work-around like this, but since I think it is theanos fault…
Are you seeing any issues? From the comments above is not clear what the problem was. Most users never have such issues or we would hear more reports, so it might have been a very specific model or an old bug in Theano that was solved since then.
We are running PyMC3 bayesian model to lets say train till T = N, and predict for N+1 as first iteration, then use posteriors as new priors for training for N+1 data point only, predicting for N+2 as second iteration, similarly leveraging posteriors as new for training N+2 data point only, predict for N+3 as third iterations, so on and so forth.
Although individual iteration only need 1 data point to train and have a priors already setup, the time taken to fit keeps on increasing overtime from 3 minutes initially ==> 7 ==> 10 ==> 30/ 35 minutes over time. My assumption is it might be because PyMC implicit hold things in memory and leaving less room for model to train upon from memory standpoint.
Great point! Since we are just using one data point the posterior doesn’t significantly changed from priors and that is also expected. Also we have seen the priors over time and it remains within the ball park. So the assumption is it might be some memory “leak”.
An alternative could also be that the more data available the more strong the correlations in the posterior. If you are using something similar to the approach in Updating priors — PyMC example gallery that could also mean the priors are a worse representation of the inferred posteriors and could even make fitting the model harder. I have an example of possible issues caused by this in one of my blogposts: Some dimensionality devils | Oriol unraveled
In general, I’d suggest you open a new topic with more details on how you are setting the whole process as there are many moving pieces that could have an effect on the fitting time.
I second what Oriol said. Another thing you can do is to sample the same model over and over again (always recreating it from scratch) and check if the sampling time keeps increasing. From your issue, it shouldn’t matter that you are doing a sort of sequential update.