How to choose the number of CPUs and Memory Size for Bayesian models using PyMC

xiaodao · March 15, 2023, 3:51pm

Hi PyMC community,

I am working on a Bayesian modelling project using PyMC (> 5) and nutpie (=0.5.1). I have some findings and questions to share:

Environment: I create a conda environment with python 3.10.9 and nutpie=0.5.1 (the latest version at conda-forge as of March 2023) in a Linux virtual machine instance. Without specifying the version of pymc, installing nutpie=0.5.1 would automatically choose PyMC=5.0.2. If I install PyMC>=5.1.x first, installing nutpie=0.5.1 via conda will downgrade PyMC back to 5.0.2. I am totally fine with this combination of PyMC and nutpie.
Chains and CPUs: pm.sample() enables users to choose the number of chains and the number of cores in the sampling process. My understanding is that one chain can be processed by one CPU at most and the maximum number of cores PyMC can leverage is 4. Thus, I might need to change to a virtual machine instance with 4 CPUs at most if I am going to sample 4 chains and I would waste the resource if I choose 8 CPUs. Then how about 8 chains? Does using 8 chains help in providing more reliable estimates?
Memory size: Does PyMC use a large amount of RAM in the sampling process? Should I increase the size of RAM as I increase the number of draws and tunes?
My python kernel dies if I am estimating a hierarchical Bayesian model with large numbers of parameters. For example, I am tweaking a two-layer hierarchical Bayesian model with 1700+ observations (number of rows in my data frame), 30+ variables (number of columns), and 300+ parameters (number of hyperparameters and parameters). Kernel dies, as I run pm.sample(draws=1000, tunes=1000, chains=2, cores=4). Kernel might not die, if I switch from a machine instance with 4 CPUs and 16 G Memory to another 72 and 100+G Memory. Any thoughts?

xiaodao · April 19, 2023, 2:39am

By discussing in this post, I get the knowledge to clear questions in my mind:

Topic		Replies	Views
Cores not optimally used version agnostic bug	16	120	November 26, 2024
Regarding the use of multiple cores Questions	4	7603	July 18, 2023
Sample with multiple cores Questions	3	1490	September 10, 2020
Cores=x doesn't impact the number of cores actually used v5	1	445	July 20, 2022
Pm.sample gets stuck after init with cores > 1 Questions	17	3984	January 4, 2021

How to choose the number of CPUs and Memory Size for Bayesian models using PyMC

Related topics