Larger dataset linear regression GPU memory problems with NUTS for larger N and D (batch training?)

junpenglao · October 3, 2017, 8:44am

Data partitioning in MCMC needs some care to do it correctly. Currently out of the box in PyMC3 you can try SGFS (it is an experimental sampler so use with caution)
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/sgfs_simple_optimization.ipynb

Alternatively, you can partition your data by hand and fit multiple smaller model with NUTS and then combine them post hoc. The official treatment is Expectation propagation (e.g., https://arxiv.org/pdf/1412.4869.pdf), which you can find some STAN codes here: GitHub - gelman/ep-stan

Topic		Replies	Views
Optimization suggestion for Hierarchical Model using NUTS on CPU/GPU Questions gpu , theano	7	1352	June 4, 2018
Excessive memory - Multiple regression Questions	9	2904	October 24, 2017
Batch process capability for pymc.sampling_jax.sample_numpyro_nuts() with GPU? v5 modeling	3	560	September 12, 2022
NUTS uses all cores Questions	18	6426	May 21, 2020
Unknown NUTS failure with error but ADVI and Metropolis works Development bug	3	1099	March 7, 2018

Larger dataset linear regression GPU memory problems with NUTS for larger N and D (batch training?)

Related topics