I think these kind of use case is at the boundary of what you can do with PyMC3 and numpyro, or any framework that try to fit all data at the same time: a huge problem here is numerical error, which most PPL will try to do sum(log_prob(proposal)) - sum(log_prob(current)), but you will need sum(log_prob(proposal) - log_prob(current)) or something like Kahan summation algorithm - Wikipedia
Currently, tensorflow probability is the only one I am aware of that making effort to handle these use cases: Add `experimental_use_kahan_sum` argument to `tfd.Independent` and `t… · tensorflow/probability@ad9ccda · GitHub
(The work is very much ongoing, so there is no good example of how to use it in MCMC yet)
Otherwise, I recommend using variational inference, where you can subsample the training and can handle large data set with existing deep learning infrastructures (for training model and deploying).