The machine has >>99% of memory allocated (out of 244GB) and is spending nearly all its time swapping to disk. I have a hard time believing that a basic demo is so memory intensive, and am wondering if there may be a memory leak in the recent release of PyMC3?
It’s running CPU only (as the AWS instance I am currently using does not have a GPU). I can move to a GPU instance if that is a problem. (But the issue isn’t execution speed, it’s memory usage).
The behavior is that memory allocation steadily grows until about 20% complete, at which point it reaches >99% of memory and starts to swap. It doesn’t appear that the allocation is happening all up front, but rather is continuous during execution.
I can’t get it to work under any circumstances. The behavior is always the same: it allocates approximately 100MB of additional memory per second, and allocated memory keeps growing until it exceeds the machine’s physical capacity, at which point the process degrades completely due to memory swapping. I have tried 61GB, 122GB and 244GB machines; all produce the same outcome except that the larger memory instances last for a longer time until they start to thrash.
It is dying during the initialization process using ADVI (v3.1) or jitter+adapt_diag (v3.2) – usually about 20% finished depending on machine specs.
Ubuntu 16.04, Python 3.5.2, Theano 0.9.0
I have tried all combinations of the following:
PyMC3 v3.2 and v3.1
Default backend
HDF5 backend
I would love to use PyMC3 for a computing project, but can’t proceed with any confidence unless I can find a way to complete runs with a reasonably available quantity of memory.
I would appreciate any insight you might have. Is there a known good configuration available within AWS (i.e. a combination of operating system, Python version, PyMC3 version, etc)?
Wondering if this issue may be relevant? Looks like another user had memory issues running on Linux in AWS, that did not appear on his/her personal machine.
Thought you might be interested – it is definitely a platform issue with Linux on AWS. (I only tried Ubuntu, don’t know about other flavors of Linux).
See other reports, e.g.
On AWS Linux, running the notebook linked earlier in the thread allocates >> 244 GB and kills the machine.
On AWS Windows, the process is stable at 160MB. Not 160GB, but 160MB.
Thanks for your attention to this. I will file an issue on Github if you want me to, please let me know.
Thanks for reporting the solution. Really puzzling that memory is managed so differently on AWS Linux. This probably makes it a Theano issue, but definitely open an issue (or comment on the one you dug up) of your findings.
I am having the same problem of memory leakage. My input dataset is barely 200Mb.
Shape of dataset is 1.2M x 19.
I have used my local instance to run sampling. It would work fine on a smaller sample of the dataset. But when I plug everything in, it starts consuming more and more memory, until it starts swapping to disk and eventually kills the session.
I tried on heavy AWS Sagemaker compute but it does the same thing over there.