Excessive memory - Multiple regression

Hi there!

I’m starting to use pymc3 and I’m currently attempting to retrieve some known parameters, but I’m running into some issues with my current code.

There are two steps that take up ALL of my memory:

  • Initializing NUTS with advi
  • Drawing/sampling using NUTS

I’ve already followed some of the online instructions that suggest using the find_MAP for the NUTS initialization. This stops the first item in my list above to drain my memory, but the NUTS step is still SUPER slow and ends up draining all of my memory.

Model

The model I’m trying to fit is a bivariate regression like this:

N : number of observations (in this case = 1500)
dim : number of dimensions of the outcome (in this case = 2)
n_xvars : number of independent variables per observation (in this case = 6)

y : matrix containing the dependent variables - has dimensions N x dim
x : matrix containing the independent variables - has dimensions N x n_xvars
beta : matrix containing the influences of each independent variable on the final outcomes - has dimensions n_xvars x dim
cov_mtx : covariance matrix used to model dependent variables - has dimensions dim x dim
xbeta : result of matrix-multiplication of x and beta - has dimensions N x dim

Model:
y ~ MultivariateNormal( mean = xbeta, cov = cov_mtx)

It’s also worth noting that the covariance matrix is estimated in two separate steps: the correlations are given an LKJ prior while the standard deviations are given a uniform prior. Then, knowing the standard deviations and the correlations, we build up the actual covariance matrix.

Here is a link to the data used. The sample code is copied below.

Sample code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt

bivariate_data = pd.read_csv("fake_data_bivariate_regression.csv")

x_vars  = ["x0","x1","x2","x3","x4","x5"]
y_vars  = ["y1","y2"]

num_obs = len(bivariate_data)

n_xvar = len(x_vars)

y_data  = bivariate_data[y_vars].values
x_data  = bivariate_data[x_vars].values

dim = len(y_vars)

# We will use separate priors for sigma and the correlation matrix.
# In order to convert the upper triangular correlation values to a
# complete correlation matrix, we need to construct an index matrix:
n_elem = dim * (dim - 1) / 2
tri_index = np.zeros([dim, dim], dtype=int)
tri_index[np.triu_indices(dim, k=1)] = np.arange(n_elem)
tri_index[np.triu_indices(dim, k=1)[::-1]] = np.arange(n_elem)

with pm.Model() as bivariate_regression:
    # Lower and upper bounds on standard deviations
    sigma_lower   = np.zeros(dim)
    sigma_upper   = np.ones(dim) * 10
    
    # Priors on the standard deviations
    sigma         = pm.Uniform('sigma', lower=sigma_lower, upper=sigma_upper, shape=dim)
    
    # LKJ Prior on the correlation terms
    corr_triangle = pm.LKJCorr('corr', n=1, p=dim)
    
    # Building an actual covariance matrix
    corr_matrix = corr_triangle[tri_index]
    corr_matrix = tt.fill_diagonal(corr_matrix, 1)
    cov_matrix  = tt.diag(sigma).dot(corr_matrix.dot(tt.diag(sigma)))
    
    # Uninformed priors on the betas
    betas = pm.Normal('betas', mu=0.0, sd=10, shape=(n_xvar,dim))
    
    # Creating the center of the multivariate distribution for each observation
    xbeta = tt.dot(x_data,betas)
    
    # Setting the likelihood of the observed data. y~MultiNorm(mean=x*beta, cov=cov_mtx)
    y = pm.MvNormal('y', mu=xbeta, cov=cov_matrix, shape=dim,observed=y_data)

n_samples = 1000
n_burn    =  500

with bivariate_regression:
    start = pm.find_MAP()
    step  = pm.NUTS(state=start)
    trace = pm.sample(draws=n_samples, step=step)

pm.traceplot(trace)

Tracebacks

When I don’t use the find_MAP procedure, I run out of memory when initializing NUTS. Here’s the traceback:

Average ELBO = -4,611.4:  98%|█████████▊| 196755/200000 [02:20<00:02, 1405.00it/s]
Traceback (most recent call last):

  File "<ipython-input-1-31ea75d96efc>", line 68, in <module>
    trace = pm.sample(draws=n_samples)

  File "C:\Anaconda2\envs\py36\lib\site-packages\pymc3\sampling.py", line 149, in sample
    start_, step = init_nuts(init=init, n_init=n_init, model=model)

  File "C:\Anaconda2\envs\py36\lib\site-packages\pymc3\sampling.py", line 434, in init_nuts
    v_params = pm.variational.advi(n=n_init)

  File "C:\Anaconda2\envs\py36\lib\site-packages\pymc3\variational\advi.py", line 147, in advi
    uw_i, e = f()

  File "C:\Anaconda2\envs\py36\lib\site-packages\theano\compile\function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))

  File "C:\Anaconda2\envs\py36\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)

  File "C:\Anaconda2\envs\py36\lib\site-packages\six.py", line 685, in reraise
    raise value.with_traceback(tb)

  File "C:\Anaconda2\envs\py36\lib\site-packages\theano\compile\function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\

MemoryError: 
Apply node that caused the error: Elemwise{mul}(TensorConstant{(1500, 1) of 0.5}, Gemm{no_inplace}.0)
Toposort index: 36
Inputs types: [TensorType(float64, col), TensorType(float64, matrix)]
Inputs shapes: [(1500, 1), (1500, 2)]
Inputs strides: [(8, 8), (16, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Gemm{inplace}(InplaceDimShuffle{1,0}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{-750.0}), Gemm{inplace}(Elemwise{Mul}[(0, 1)].0, TensorConstant{-1.0}, Elemwise{mul}.0, InplaceDimShuffle{1,0}.0, TensorConstant{-1.0})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

And here is the traceback from when I do use the find_MAP procedure:

 15%|█▍        | 149/1000 [01:55<11:32,  1.23it/s]---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
C:\Anaconda2\envs\py36\lib\site-packages\theano\compile\function_module.py in __call__(self, *args, **kwargs)
    883             outputs =\
--> 884                 self.fn() if output_subset is None else\
    885                 self.fn(output_subset=output_subset)

MemoryError: 

During handling of the above exception, another exception occurred:

MemoryError                               Traceback (most recent call last)
<ipython-input-3-59a9a0691c36> in <module>()
----> 1 import pandas as pd
      2 import numpy as np
      3 import matplotlib.pyplot as plt
      4 import pymc3 as pm
      5 import theano.tensor as tt

C:\Anaconda2\envs\py36\lib\site-packages\pandas\__init__.py in <module>()
     57 from pandas.util._print_versions import show_versions
     58 from pandas.io.api import *
---> 59 from pandas.util._tester import test
     60 import pandas.testing
     61 

C:\Anaconda2\envs\py36\lib\site-packages\pandas\util\_tester.py in <module>()
      9 
     10 try:
---> 11     import pytest
     12 except ImportError:
     13     def test():

D:\Document Folders\Documents\pytest.py in <module>()
     51     start = pm.find_MAP()
     52     step  = pm.NUTS(state=start)
---> 53     trace = pm.sample(draws=n_samples, step=step)
     54 
     55 pm.traceplot(trace)

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    173         sample_func = _sample
    174 
--> 175     return sample_func(**sample_args)
    176 
    177 

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
    183         sampling = tqdm(sampling, total=draws)
    184     try:
--> 185         for strace in sampling:
    186             pass
    187     except KeyboardInterrupt:

C:\Anaconda2\envs\py36\lib\site-packages\tqdm\_tqdm.py in __iter__(self)
    870 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    871 
--> 872             for obj in iterable:
    873                 yield obj
    874                 # Update and print the progressbar.

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    265         if i == tune:
    266             step = stop_tuning(step)
--> 267         point = step.step(point)
    268         strace.record(point)
    269         yield strace

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\arraystep.py in step(self, point)
    140         bij = DictToArrayBijection(self.ordering, point)
    141 
--> 142         apoint = self.astep(bij.map(point))
    143         return bij.rmap(apoint)
    144 

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in astep(self, q0)
    138             if v == -1:
    139                 qn, pn, _, _, q1, n1, s1, a, na = buildtree(
--> 140                     leapfrog, qn, pn, u, v, j, e, Emax, E0)
    141             else:
    142                 _, _, qp, pp, q1, n1, s1, a, na = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    183             if v == -1:
    184                 qn, pn, _, _, q11, n11, s11, a11, na11 = buildtree(
--> 185                     leapfrog1_dE, qn, pn, u, v, j - 1, e, Emax, E0)
    186             else:
    187                 _, _, qp, pp, q11, n11, s11, a11, na11 = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    183             if v == -1:
    184                 qn, pn, _, _, q11, n11, s11, a11, na11 = buildtree(
--> 185                     leapfrog1_dE, qn, pn, u, v, j - 1, e, Emax, E0)
    186             else:
    187                 _, _, qp, pp, q11, n11, s11, a11, na11 = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    179     else:
    180         qn, pn, qp, pp, q1, n1, s1, a1, na1 = buildtree(
--> 181             leapfrog1_dE, q, p, u, v, j - 1, e, Emax, E0)
    182         if s1 == 1:
    183             if v == -1:

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    183             if v == -1:
    184                 qn, pn, _, _, q11, n11, s11, a11, na11 = buildtree(
--> 185                     leapfrog1_dE, qn, pn, u, v, j - 1, e, Emax, E0)
    186             else:
    187                 _, _, qp, pp, q11, n11, s11, a11, na11 = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    183             if v == -1:
    184                 qn, pn, _, _, q11, n11, s11, a11, na11 = buildtree(
--> 185                     leapfrog1_dE, qn, pn, u, v, j - 1, e, Emax, E0)
    186             else:
    187                 _, _, qp, pp, q11, n11, s11, a11, na11 = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    183             if v == -1:
    184                 qn, pn, _, _, q11, n11, s11, a11, na11 = buildtree(
--> 185                     leapfrog1_dE, qn, pn, u, v, j - 1, e, Emax, E0)
    186             else:
    187                 _, _, qp, pp, q11, n11, s11, a11, na11 = buildtree(

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    179     else:
    180         qn, pn, qp, pp, q1, n1, s1, a1, na1 = buildtree(
--> 181             leapfrog1_dE, q, p, u, v, j - 1, e, Emax, E0)
    182         if s1 == 1:
    183             if v == -1:

C:\Anaconda2\envs\py36\lib\site-packages\pymc3\step_methods\nuts.py in buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0)
    172 def buildtree(leapfrog1_dE, q, p, u, v, j, e, Emax, E0):
    173     if j == 0:
--> 174         q1, p1, dE = leapfrog1_dE(q, p, array(v * e), E0)
    175 
    176         n1 = int(log(u) + dE <= 0)

C:\Anaconda2\envs\py36\lib\site-packages\theano\compile\function_module.py in __call__(self, *args, **kwargs)
    896                     node=self.fn.nodes[self.fn.position_of_error],
    897                     thunk=thunk,
--> 898                     storage_map=getattr(self.fn, 'storage_map', None))
    899             else:
    900                 # old-style linkers raise their own exceptions

C:\Anaconda2\envs\py36\lib\site-packages\theano\gof\link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

C:\Anaconda2\envs\py36\lib\site-packages\six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

C:\Anaconda2\envs\py36\lib\site-packages\theano\compile\function_module.py in __call__(self, *args, **kwargs)
    882         try:
    883             outputs =\
--> 884                 self.fn() if output_subset is None else\
    885                 self.fn(output_subset=output_subset)
    886         except Exception:

MemoryError: 
Apply node that caused the error: Elemwise{mul}(TensorConstant{(1500, 1) of -0.5}, Gemm{no_inplace}.0)
Toposort index: 13
Inputs types: [TensorType(float64, col), TensorType(float64, matrix)]
Inputs shapes: [(1500, 1), (1500, 2)]
Inputs strides: [(8, 8), (16, 8)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Gemm{inplace}(Elemwise{Mul}[(0, 1)].0, TensorConstant{-1.0}, Elemwise{mul}.0, InplaceDimShuffle{1,0}.0, TensorConstant{-1.0}), Gemm{inplace}(InplaceDimShuffle{1,0}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, Elemwise{mul}.0, TensorConstant{750.0})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Versions and main components

  • PyMC3 Version: 3.0
  • Theano Version: 0.9.0.dev-c697eeab84e5b8a74908da654b66ec9eca4f1291
  • Python Version: 3.6.1
  • Operating system: Windows 8.1
  • PyMC3 installation: conda

Computer specs

  • Processor: Intel i7 - 4 cores @ 2.10GHz
  • RAM: 8.00 GB

I apologize for the long post. I’d appreciate any kind of feedback!

Thank you!!!

It is no longer recommended to use MAP as the starting point for NUTS. It is more often that the MAP is not in the typical set.

I think there are a few places your model could be optimized, below is how I would parameterize the model (using the default option for NUTS, parameterized the MvNormal using Cholesky LKJ etc.).

with pm.Model() as bivariate_regression:
    # Note that we access the distribution for the standard
    # deviations, and do not create a new random variable.
    sd_dist = pm.HalfCauchy.dist(beta=2.5)
    packed_chol = pm.LKJCholeskyCov('chol_cov', n=dim, eta=1, sd_dist=sd_dist)
    # compute the covariance matrix
    chol = pm.expand_packed_triangular(dim, packed_chol, lower=True)
    
    # Extract the cov matrix and standard deviations
    cov = tt.dot(chol, chol.T)
    sd = pm.Deterministic('sd',tt.sqrt(tt.diag(cov)))
    
    # Uninformed priors on the betas
    betas = pm.Normal('betas', mu=0.0, sd=10, shape=(n_xvar,dim))
    
    # Creating the center of the multivariate distribution for each observation
    xbeta = tt.dot(x_data, betas)
    y = pm.MvNormal('y', mu=xbeta, chol=chol, observed=y_data)

    trace = pm.sample(1000)

Also, you are on version 3.0, which is out of date. You can update using conda update -c conda-forge pymc3.
About the memory issue: Are you on amazon aws? Because there seems to be a very strange memory leak Excessive memory usage in PyMC3? (Solved - AWS Linux platform issue. Works on AWS Windows)

Hello!

Thank you so much for both of your suggestions, @junpenglao and @aseyboldt !!! I’ve updated pymc3 to version 3.2 and used the cholesky parametrization and the memory utilization is now just fine! It still takes quite some time to run, but it works!!!

But @junpenglao, I still have a couple of questions regarding the cholesky parametrization you’re proposing.

I’ve read from the pymc3 docs that packed_chol holds a packed covariance matrix, which you’re “unpacking” in the expand statement and generating chol. But then, to get the standard deviations, you multiply chol by its transpose and then take the square root of the result’s diagonal. Aren’t you then actually getting the variances? In other words, if you wanted to get the standard deviations, wouldn’t it be enough to run sd = pm.Deterministic('sd',tt.sqrt(tt.diag(chol))) instead?
I mean, the version you suggested recovers the actual standard deviations I used to generate the data, but why?
If chol were the actual covariance matrix, why can’t we just take tt.sqrt(tt.diag(chol))?

Also, is there any interesting/smart way to keep track of the correlation terms? Like, creating a Deterministic node like we did for the standard deviations, but that keeps track of the correlation terms?

And answering your question, @aseyboldt, no, I’m not on an Amazon AWS. Just a regular PC running windows.

Thanks again for the suggestions!

pm.LKJCholeskyCov returns the Cholesky decomposition of the covariance matrix. It is a triangular matrix - that’s why doing cov = tt.dot(chol, chol.T) you recover the covariance matrix.

To get the correlation matrix, you can do:

corr = tt.diag(sd**-1).dot(cov.dot(tt.diag(sd**-1)))

Thanks for the suggestion on how to get the correlation matrix @junpenglao! I had figured out another way of doing it, but it was super convoluted - this is much more elegant, thanks!

I’ve updated my code using your suggestions and adapted it for a 4-variate regression, but I’m still running into memory issues.

The 8 GB of memory on my machine get all used up and the program crashes during the sampling process.
I tried the code on another more powerful computer (32GB of RAM) and the program didn’t crash, but it did use up all of the computer’s RAM. Is there anything I could do to reduce this issue?

Data

Link to multivariate data: https://www.dropbox.com/s/kgo34kv0qlcb3p4/fake_data_multivariate_regression.csv?dl=0

Sample code

multivariate_data = pd.read_csv("fake_data_multivariate_regression.csv")

x_vars  = ["x0","x1","x2","x3","x4","x5"]
y_vars  = ["y1","y2","y3","y4"]

num_obs = len(multivariate_data)

n_xvar = len(x_vars)

y_data  = multivariate_data[y_vars].values
x_data  = multivariate_data[x_vars].values

dim = len(y_vars)

# Used to extract the correlation terms
cor_term_indices = np.triu_indices(dim,k=1)

with pm.Model() as multivariate_regression:
    # Uninformed priors for the covariance matrix using Cholesky factors
    sd_dist = pm.HalfCauchy.dist(beta=2.5)
    packed_chol = pm.LKJCholeskyCov('chol_fact', n=dim, eta=1, sd_dist=sd_dist)
    
    # Transform the packed cholesky factor (array) into an unpacked factor (matrix)
    chol = pm.expand_packed_triangular(dim, packed_chol, lower=True)
    
    # Generate covariance matrix from Cholesky factor
    cov_mtx = tt.dot(chol, chol.T)

    # Generate the standard deviations from the cholesky factor
    sd  = pm.Deterministic('sd',tt.sqrt(tt.diag(cov_mtx)))

    # Tensor manipulation used to extract the correlation terms
    cor_mtx = tt.diag(sd**-1).dot(cov_mtx).dot(tt.diag(sd**-1))
    cor_terms  = pm.Deterministic('cor_terms',cor_mtx[cor_term_indices])
    
    # Uninformed priors on the betas
    betas = pm.Normal('betas', mu=0.0, sd=10, shape=(n_xvar,dim))
    
    # Creating the center of the multivariate distribution for each observation
    y = pm.MvNormal('y', mu=tt.dot(x_data, betas), chol=chol, observed=y_data)

n_samples = 1000
n_burn    =  500
n_thin    =   10

with multivariate_regression:
    step  = pm.NUTS()
    trace = pm.sample(draws=n_samples, step=step, tune=n_burn)

pm.traceplot(trace)
plt.show()

Again, thank you for all the support!!!

The memory issue is very strange, I can’t even reproduce it locally. This is not the first time we hear something like this, a strange memory leak might be hiding somewhere (possibly in theano).

The sampler has a very hard time to get started with your model, initialising using advi might help (set init='advi+adapt_diag') but I’d just try to rescale the variables, maybe something like this:

from sklearn import preprocessing

scaler_x = preprocessing.StandardScaler()
x_data_ = scaler_x.fit_transform(x_data)
x_data_[:, 0] = 1  # This is just an intercept, right?

scaler_y = preprocessing.StandardScaler()
y_data_ = scaler_y.fit_transform(y_data)

Of course, this changes how you’d interpret the beta values. Also note that without some kind of rescaling the prior for beta is very strange, it ends up with some values near -80.

Hi @aseyboldt,

I’ve tried these and here are my results:

  • Just setting the ‘init’ parameter: memory gets all chomped up
  • Just rescaling the variables: the program definitely uses less memory, but it seems like the fundamental problem is still there, somewhere
  • Setting the ‘init’ parameter and rescaling the variables: same as just rescaling

Really? How so? My rationale here was that since I don’t know if the all of the x’s are part of every y’s regression or not and I also don’t know the direction of their influences, I’d just start them off centered at zero with a fat/wide prior - that’s why the Normal(mu=0,sd=10) came into my mind. I think I’ve seen a couple of other regression approaches that use this Normal(mu=0,sd=10) prior on the betas. Maybe not on a multivariate case where we’re simultaneously estimating multiple regression lines, but still.
Also, when I ran the 4-variate code from my previous post on the more powerful computer, I was in fact able to recover all of the true beta parameters - none of them got stuck at -80. I mean, all of the computer’s 32 GB of RAM got used up in the computation, but it was still able to recover the original beta parameters.
What would you suggest as a more interesting prior on the betas?

Thanks again for all the helpful suggestions!

Is this on AWS by any chance?

It’s not AWS related:

I cannot answer about the memory issue as well. I cannot reproduce it in my Linux environment.