Larger dataset linear regression GPU memory problems with NUTS for larger N and D (batch training?)

Data partitioning in MCMC needs some care to do it correctly. Currently out of the box in PyMC3 you can try SGFS (it is an experimental sampler so use with caution)
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/sgfs_simple_optimization.ipynb

Alternatively, you can partition your data by hand and fit multiple smaller model with NUTS and then combine them post hoc. The official treatment is Expectation propagation (e.g., https://arxiv.org/pdf/1412.4869.pdf), which you can find some STAN codes here: GitHub - gelman/ep-stan