What makes sequential Bayesian updating more challenging compared to full batch updating?

Suppose we aim to efficiently calculate the posterior distribution of certain parameters for a regression model as data arrives in a continuous stream. The sequential updating characteristic of Bayesian inference would be beneficial. However, I have observed that it is not possible to update posteriors sequentially using PyMC, and I am uncertain if any other probabilistic packages support this feature?

I am curious about the reason behind the difficulty in achieving this. Could it be because the priors, initially independent, become correlated after computations, and this correlation goes unnoticed?

I am keen to delve deeper into this issue and understand the underlying reasons. Additionally, I am interested in hearing about techniques to address this challenge.

Welcome!

I would suggest taking a look at this notebook as well as the histogram_approximation distribution and prior_from_idata() function in PyMC-experimental. Once you play around with those (even with some toy problems), you might have a more intuitive grasp of the differences between batch and incremental approaches to updating.

1 Like

The difficulty is that the intermediate posteriors don’t have closed form and mcmc samplers need mathematical densities to their job.

The suggestions above are all ways to rederive a density from the posterior histogram but how well they work in practice is a mystery. Often it’s faster and safer to just resample with the whole dataset.

The textbook cases always work with simple coniugate prior models.

1 Like

I think GFlowNets can help with sequential updating without the limitations of variational inference (which is, I guess, what @ricardoV94 meant by the phrase “textbooks always work with simple conjugate prior models”): see section 4. in Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.. This is what I suggested to implement in Sampling with a diffusion model

1 Like