Memory leak for GP prediction

ckrapu · March 1, 2021, 1:34am

I’m interested in fitting a GP once and then using the object to make new predictions many times. The following script shows that the memory footprint of the process grows with every usage of gp.predict.

import numpy as np
import pymc3 as pm
import os, psutil

X = np.random.randn(100, 1)
y = np.random.randn(100, 1)

with pm.Model() as model:
    ls   = pm.Uniform('ls', lower=0.1, upper=4.0) 
    matern   = pm.gp.cov.Matern52(1, ls=ls, active_dims=[0])
    gp = pm.gp.Marginal(cov_func=matern)
    ll = gp.marginal_likelihood('ll', X, y, noise=1.0, is_observed=True)
    trace = pm.sample(chains=1, cores=1, tune=2, draws = 2)
    
    Xnew = np.random.randn(200,1)
    process = psutil.Process(os.getpid())
    for i in range(30):
        _ = gp.predict(X, diag=False)
        
        print('Memory usage (GB):', process.memory_info().rss/ 1_000_000_000)

Using tracemalloc points to a large amount of expanding memory usage at this part of the Theano codebase but I can’t make heads or tails of what’s going on in there.

Does anyone have an idea for a workaround to reuse the GP for prediction multiple times? I’ve struggled with creating a compiled Theano function from gp.predictt as this generates testval errors.

ricardoV94 · March 1, 2021, 5:32am

I can’t be off much help here but you can disable testval errors during theano function compilation.

Something like theano.config.compute_test_value = 'off' iirc

ckrapu · March 1, 2021, 7:19pm

That’s a good piece of advice but in my case that setting appears to be overridden by whatever PyMC3 does when it imports, though that’s a separate issue.

bwengals · March 2, 2021, 12:08am

I think I’ve seen this before and hadn’t been able to fix it. Definitely something in theano is weird here. Is it like the theano graph keeps expanding and isn’t overwritten by _?

ckrapu · March 2, 2021, 2:07am

Yup, there’s no state being kept intentionally. The size of the memory increment is roughly proportional to the size of the graph being built.

ckrapu · March 7, 2021, 8:46pm

After looking through the code some more, I suspect that the issue could be due to repeated memoization of the GP theano prediction function. @brandonwillard Do you think this issue might be solved by Replace custom memoize module with cachetools by brandonwillard · Pull Request #4509 · pymc-devs/pymc3 · GitHub?

brandonwillard · March 7, 2021, 9:02pm

The current form of the memoize replacement still uses an unbound cache for methods (i.e. a normal dict). If the method memoization was causing this memory problem, then we can change to a bounded cache. Otherwise, I did replace the other non-method uses of memoization with a bounded LRU cache, so, if those were the cause, the PR should fix it.

brandonwillard · March 7, 2021, 9:28pm

I just changed the PR so that it uses an LRU cache for everything.

ckrapu · March 7, 2021, 9:35pm

Great! I’ll try this out on the new version and see if it fixes the problem.

Polichinel · April 15, 2021, 8:46am

Hi

So i’m running into the same problem and when predicting my model ends up hording +1TB ram (using pymc3 3.9.3)… Is the potential fix (Pull Request #4509) included in PyMC3 3.11.2 (version from 14 March 2021)?

And can you @ckrapu confirm that the fix works?

Best regards

twiecki · April 15, 2021, 9:07am

@Polichinel I would expect that to be the fix. Any reason you can’t simply upgrade and test?

Polichinel · April 15, 2021, 10:54am

Sound good @twiecki - thanks for a swift reply.

Yes, my model is running on a very large central server shared by a couple of Unis in Denmark. As such i’m discouraged from installing software or update libraries on my own - if it can be avoided. So even small updates likes this should go through an official pip-line.

Thus I just wanted to make sure that I was giving them (the server support) the correct info: that I want the 3.11.2 version. It appears they are already on it, so I’ll update you soon enough about whether the problem is solved

ckrapu · April 15, 2021, 3:42pm

Unfortunately, it appears that the problem still occurs in version 3.11.2, running off the v3 branch. I’ve opened an issue and will be documenting more information there.

Polichinel · April 15, 2021, 4:24pm

Hi @ckrapu

I don’t want to jinx anything but the update does appear to have fixed the problem at my end! Right now my loop is at 1454/10677 iterations and at this point the model used to take up +130gb ram (going towards +1TB ram…). Right now its its taking up 22gb ram - and intriguingly that has not really changed since the loop began. A huge improvement that is

ckrapu · April 15, 2021, 4:40pm

Okay, that’s good to know. I’ve only been running 30 iterations to test and I was seeing increments of ~10 MB each time. Maybe there is some other source of variation I’m unaware of. I originally had issues with a script that I can’t share for work reasons using much larger datasets, so now I can go back and check to see if that’s working better.

Polichinel · April 16, 2021, 10:25am

The loop is done (10677/10677). Everything appear in order and it never went substantially over 22GB ram → compared + 1TB ram before the update. Well done indeed!

Hopefully you’ll get similar results @ckrapu when you run your larger dataset through your model

Cheers

Topic		Replies	Views
GP marginal likelihood - Theano compilation error Questions theano	3	1097	October 22, 2020
Removing/Resetting existing Gaussian Process Models Questions	4	1223	January 31, 2021
Memory requirements for a GP model Questions	3	768	November 29, 2017
Pymc3 3.7 memory leak Questions	0	547	June 2, 2019
Very high memory usage with each prediction Questions	2	751	November 30, 2020

Memory leak for GP prediction

Related topics