Memory requirements for a GP model

I am trying to fit data from earthquakes with a Gaussian Process. I have ~15,000 observations with 2D locations (latitude longitude). When I create the following model:

with pm.Model() as m:
    # kernel parameters
    theta = pm.Uniform(f'theta', lower=-1, upper=1)
    pi = pm.Uniform(f'pi', lower=-1, upper=1)
    rho = pm.HalfCauchy(f'rho', 5)

    cov = theta * pm.gp.cov.Exponential(2, ls=rho) + pi   
    gp = pm.gp.Marginal(cov_func=cov)
    
    # Model error
    sigma_noise  = pm.HalfNormal("sigma_noise",  sd=1, testval=0.05)
    cov_noise = pm.gp.cov.WhiteNoise(sigma_noise)

    # Data likelihood
    y = gp.marginal_likelihood('y', X=lonlats_eq, y=observed.resid.values, noise=cov_noise)

Compiling this model seems to take over 32 GB of RAM. Is this expected? Is there anything that can be done to remove the RAM usage?

It is expected as it builds a large cov matrix (15000*2 by 15000*2). Did you try with minibatch or Sparse Approximations http://docs.pymc.io/notebooks/GP-SparseApprox.html?

Okay. I guess it is going to get exciting when add another set of coordinates. I will look into your suggestions.

RAM usage is very high in general with GP models, for the reason @junpenglao said. Additionally, it looks like Theano is kind of inefficient memory-wise with gradient calculations going through Cholesky decompositions. See this paper.

2 Likes