Suppose we aim to efficiently calculate the posterior distribution of certain parameters for a regression model as data arrives in a continuous stream. The sequential updating characteristic of Bayesian inference would be beneficial. However, I have observed that it is not possible to update posteriors sequentially using PyMC, and I am uncertain if any other probabilistic packages support this feature?
I am curious about the reason behind the difficulty in achieving this. Could it be because the priors, initially independent, become correlated after computations, and this correlation goes unnoticed?
I am keen to delve deeper into this issue and understand the underlying reasons. Additionally, I am interested in hearing about techniques to address this challenge.
I would suggest taking a look at this notebook as well as the histogram_approximation distribution and prior_from_idata() function in PyMC-experimental. Once you play around with those (even with some toy problems), you might have a more intuitive grasp of the differences between batch and incremental approaches to updating.
The difficulty is that the intermediate posteriors don’t have closed form and mcmc samplers need mathematical densities to their job.
The suggestions above are all ways to rederive a density from the posterior histogram but how well they work in practice is a mystery. Often it’s faster and safer to just resample with the whole dataset.
The textbook cases always work with simple coniugate prior models.