Memory requirements for a GP model

arkottke · November 29, 2017, 2:17pm

I am trying to fit data from earthquakes with a Gaussian Process. I have ~15,000 observations with 2D locations (latitude longitude). When I create the following model:

with pm.Model() as m:
    # kernel parameters
    theta = pm.Uniform(f'theta', lower=-1, upper=1)
    pi = pm.Uniform(f'pi', lower=-1, upper=1)
    rho = pm.HalfCauchy(f'rho', 5)

    cov = theta * pm.gp.cov.Exponential(2, ls=rho) + pi   
    gp = pm.gp.Marginal(cov_func=cov)
    
    # Model error
    sigma_noise  = pm.HalfNormal("sigma_noise",  sd=1, testval=0.05)
    cov_noise = pm.gp.cov.WhiteNoise(sigma_noise)

    # Data likelihood
    y = gp.marginal_likelihood('y', X=lonlats_eq, y=observed.resid.values, noise=cov_noise)

Compiling this model seems to take over 32 GB of RAM. Is this expected? Is there anything that can be done to remove the RAM usage?

junpenglao · November 29, 2017, 2:26pm

It is expected as it builds a large cov matrix (15000*2 by 15000*2). Did you try with minibatch or Sparse Approximations http://docs.pymc.io/notebooks/GP-SparseApprox.html?

arkottke · November 29, 2017, 3:43pm

Okay. I guess it is going to get exciting when add another set of coordinates. I will look into your suggestions.

bwengals · November 29, 2017, 9:57pm

RAM usage is very high in general with GP models, for the reason @junpenglao said. Additionally, it looks like Theano is kind of inefficient memory-wise with gradient calculations going through Cholesky decompositions. See this paper.

Topic		Replies	Views
Removing/Resetting existing Gaussian Process Models Questions	4	1227	January 31, 2021
Memory leak for GP prediction Questions	15	749	April 16, 2021
Pymc, numpyro GPs and Transform RVs memory behaviour v5 gpu , gaussian_process , sampling	9	840	December 21, 2022
Very high memory usage with each prediction Questions	2	752	November 30, 2020
Using a GP as Covariance Matrix for an MvNormal Questions	4	756	June 15, 2020

Memory requirements for a GP model

Related topics