Very high memory usage with each prediction


I’m using this model to do some predictions:

def save_model():
    with pm.Model() as model:

      l = pm.Gamma("l", alpha=2, beta=1)
      offset = pm.Gamma("offset", alpha=2, beta=1)
      nu = pm.HalfCauchy("nu", beta=1)

      cov = nu ** 2 *[1], l, 2, offset)

      gp =

      sigma = pm.HalfCauchy("sigma", beta=1)
      y_ = gp.marginal_likelihood("y", X=X, y=Y, noise=sigma)

      map = [pm.find_MAP()]

      X_New_shared = theano.shared(X)

      f_pred ="f_pred", X_New_shared, shape=X_New_shared.get_value().shape[0])

I have saved this model with pickle using

        with open('save/saved_model.p', 'wb') as buff:

            pickle.dump({'model': self.model, 'trace':, 'X_New_shared': X_New_shared, 'f_pred': f_pred, 'scaler': globalScaler, 'encoder': globalEncoder, 'gp':}, buff)

I’m trying to use this model to do predictions for API calls.

My current approach is to load this model (I know I’m pickling a lot of unnecessary stuff here):

    with open(const.model_path, 'rb') as buff:
        data = pickle.load(buff)
        return {
            "model": data['model'],
            "trace": data['trace'],
            "x_shared": data['X_New_shared'],
            "f_pred": data['f_pred'],
            "scaler": data['scaler'],
            "encoder": data['encoder'],
            "gp": data['gp'],

and then do the predictions with:

    def predict_gp(self):
        with self.poly_model:
            mu, var =, point=self.trace[0], diag=True)
            return mu

So basically I’m changing x_shared to get new predictions when calling predict_gp()

All this works fine, but the issue is;

[1] Every time I run the predict_gp() method, memory usage increase by about 15MB, and it’s not freed up after the prediction. So as much as I call this method, it keeps increasing until the system runs out of memory.

[2] I find its impossible to call this predict_gp() method concurrently, since it throws errors. I’m currently able to get one prediction after the other. I reckon this is probably because changing shared variable in the middle of the prediction somehow alters the theano graph…

I’m looking at how I can solve this memory issue at the moment. I read several threads relating to similar issues, although not exact: Excessive memory usage in PyMC3? (Solved - AWS Linux platform issue. Works on AWS Windows)

I tried using multiprocessing, and it gets rid of memory issue, but then since it compiles theano every time, each prediction takes about 12 seconds, where as it took around 600ms earlier.

Any thoughts on this?


Okay so an additional observation. I tried to get predictions using this piece of code:

        with self.poly_model:
            pred_samples = pm.sample_posterior_predictive(, vars=[self.f_pred], samples=2000, random_seed=42)
            y_pred, uncer = pred_samples["f_pred"].mean(axis=0), pred_samples["f_pred"].std(axis=0)

This memory issue doesn’t occur then… I suppose this is because we’re not creating additional nodes in theano graph this way?
But this isn’t ideal for me due to time it takes to draw samples.

I was wondering if I’m writing this gp.predict() method wrong.

Does setting Xnew = x_shared create a new node in thaeno graph? If so is there an alternative approach?

To me this seems to be like an issue with implementation of gp.predict() method itself, or an issue with how I’ve used shared variables with gp.predict(). I don’t see how this method can be used with data container adaptation either.


So I was able to finally solve this issue.

I encountered this error while running some jMeter tests to repeatedly call the gp.predict() method with different Xnew values. (pymc3/gp/ -> Marginal:predict()).

Upon running a python memory profiler on this function I tracked this unusual memory usage to draw_values (/pymc3/distributions/ -> draw_values()) to
_draw _value() (/pymc3/distributions/ -> _draw _value()) and finally to
_compile_theano_function() (/pymc3/distributions/ -> _compile_theano_function())

_compile_theano_function has this @memoize function decorator, which is a caching mechanism. My initial instinct was to remove the function decorator, and my application was running without any issues afterwards. I tested the predictions to see if there was any difference, and also compared the prediction times. There was no significant difference.

However later I realized that memoize (pymc3->memoize) does have a clear_cache function. Calling this after each prediction seems to have solved this memory issue for me. (pymc3.memoize.clear_cache()) . This approach didn’t change prediction times or predictions itself for me. So this issue is solved :slight_smile: