Excessive memory usage in PyMC3? (Solved - AWS Linux platform issue. Works on AWS Windows)

spudthorpe · October 18, 2017, 8:58pm

Hello,

As a prelude to beginning development with PyMC3, I wanted to make sure I had a machine and environment that was capable of running some basic demos.

I am using an Amazon AWS instance running Ubuntu, with 32 cores and 244GB memory (so quite a hefty machine).

On a basic demo such as the following notebook, I find that the machine bogs down completely at less than 20% complete on the cell with contents

with neural_network:
    trace = pm.sample(1000, tune=200)

(notebook available at: https://github.com/twiecki/WhileMyMCMCGentlySamples/blob/master/content/downloads/notebooks/random_walk_deep_net.ipynb)

The machine has >>99% of memory allocated (out of 244GB) and is spending nearly all its time swapping to disk. I have a hard time believing that a basic demo is so memory intensive, and am wondering if there may be a memory leak in the recent release of PyMC3?

Thanks for any advice…

junpenglao · October 18, 2017, 9:06pm

Are you running on GPU? Also, what is the theano and PyMC3 version you are running?

spudthorpe · October 18, 2017, 9:18pm

Hello,

It’s Theano 0.9.0, PyMC3 3.2.

It’s running CPU only (as the AWS instance I am currently using does not have a GPU). I can move to a GPU instance if that is a problem. (But the issue isn’t execution speed, it’s memory usage).

Thanks!

junpenglao · October 18, 2017, 9:21pm

Yeah it is likely unrelated to GPU/CPU. any idea @twiecki?

twiecki · October 19, 2017, 10:42am

This could be due to the pre-initialization of the trace, as the model is fairly high-dimensional. Can you try with the HDF5 backend?

spudthorpe · October 19, 2017, 11:17am

Thanks for the suggestion. I’ll give it a try.

The behavior is that memory allocation steadily grows until about 20% complete, at which point it reaches >99% of memory and starts to swap. It doesn’t appear that the allocation is happening all up front, but rather is continuous during execution.

I will let you know the results…

spudthorpe · October 19, 2017, 4:31pm

@twiecki & @junpenglao,

I can’t get it to work under any circumstances. The behavior is always the same: it allocates approximately 100MB of additional memory per second, and allocated memory keeps growing until it exceeds the machine’s physical capacity, at which point the process degrades completely due to memory swapping. I have tried 61GB, 122GB and 244GB machines; all produce the same outcome except that the larger memory instances last for a longer time until they start to thrash.

It is dying during the initialization process using ADVI (v3.1) or jitter+adapt_diag (v3.2) – usually about 20% finished depending on machine specs.

Ubuntu 16.04, Python 3.5.2, Theano 0.9.0

I have tried all combinations of the following:

PyMC3 v3.2 and v3.1
Default backend
HDF5 backend

I would love to use PyMC3 for a computing project, but can’t proceed with any confidence unless I can find a way to complete runs with a reasonably available quantity of memory.

I would appreciate any insight you might have. Is there a known good configuration available within AWS (i.e. a combination of operating system, Python version, PyMC3 version, etc)?

Thanks again for your assistance!

spudthorpe · October 19, 2017, 5:11pm

Wondering if this issue may be relevant? Looks like another user had memory issues running on Linux in AWS, that did not appear on his/her personal machine.

Thanks!

spudthorpe · October 19, 2017, 7:41pm

@twiecki & @junpenglao,

Thought you might be interested – it is definitely a platform issue with Linux on AWS. (I only tried Ubuntu, don’t know about other flavors of Linux).

See other reports, e.g.

On AWS Linux, running the notebook linked earlier in the thread allocates >> 244 GB and kills the machine.

On AWS Windows, the process is stable at 160MB. Not 160GB, but 160MB.

Thanks for your attention to this. I will file an issue on Github if you want me to, please let me know.

twiecki · October 20, 2017, 6:01pm

Thanks for reporting the solution. Really puzzling that memory is managed so differently on AWS Linux. This probably makes it a Theano issue, but definitely open an issue (or comment on the one you dug up) of your findings.

Varun_Gupta · June 29, 2023, 8:45pm

Hi,

I am having the same problem of memory leakage. My input dataset is barely 200Mb.
Shape of dataset is 1.2M x 19.

I have used my local instance to run sampling. It would work fine on a smaller sample of the dataset. But when I plug everything in, it starts consuming more and more memory, until it starts swapping to disk and eventually kills the session.

I tried on heavy AWS Sagemaker compute but it does the same thing over there.

Using PYMC v5. Python 3.9.

Can anyone please help me on this?

Topic		Replies	Views
Memory Leak on AWS EC2 Questions	0	565	June 20, 2018
Does PyMC implicitly hold things in memory? Problem with memory growing without bound Questions	15	2387	October 8, 2022
Memory Leakage on Local Machine and AWS v5 development , theano , bug , modeling	2	336	July 11, 2023
Memory issues with creating simple regression model Questions	4	1938	June 17, 2019
MemeoryError when sampling Questions	2	539	July 12, 2018

Excessive memory usage in PyMC3? (Solved - AWS Linux platform issue. Works on AWS Windows)

Related topics