What makes sequential Bayesian updating more challenging compared to full batch updating?

kurt · October 26, 2023, 6:09pm

Suppose we aim to efficiently calculate the posterior distribution of certain parameters for a regression model as data arrives in a continuous stream. The sequential updating characteristic of Bayesian inference would be beneficial. However, I have observed that it is not possible to update posteriors sequentially using PyMC, and I am uncertain if any other probabilistic packages support this feature?

I am curious about the reason behind the difficulty in achieving this. Could it be because the priors, initially independent, become correlated after computations, and this correlation goes unnoticed?

I am keen to delve deeper into this issue and understand the underlying reasons. Additionally, I am interested in hearing about techniques to address this challenge.

cluhmann · October 26, 2023, 6:20pm

Welcome!

I would suggest taking a look at this notebook as well as the histogram_approximation distribution and prior_from_idata() function in PyMC-experimental. Once you play around with those (even with some toy problems), you might have a more intuitive grasp of the differences between batch and incremental approaches to updating.

ricardoV94 · October 26, 2023, 8:49pm

The difficulty is that the intermediate posteriors don’t have closed form and mcmc samplers need mathematical densities to their job.

The suggestions above are all ways to rederive a density from the posterior histogram but how well they work in practice is a mystery. Often it’s faster and safer to just resample with the whole dataset.

The textbook cases always work with simple coniugate prior models.

leventov · March 14, 2024, 7:23pm

I think GFlowNets can help with sequential updating without the limitations of variational inference (which is, I guess, what @ricardoV94 meant by the phrase “textbooks always work with simple conjugate prior models”): see section 4. in Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.. This is what I suggested to implement in Sampling with a diffusion model

Topic		Replies	Views
Accelerate the estimation by sequential process using pyMC?	1	45	November 3, 2024
Updating priors vs using more data give different results Questions	2	1086	March 15, 2021
Incremental updates with independent multi-dimensional parameters Questions	0	472	October 12, 2020
Prior based on sample Questions prior	2	500	December 13, 2021
Deploying Bayesian Models with PyMC3 Questions	2	1304	May 23, 2020

What makes sequential Bayesian updating more challenging compared to full batch updating?

Related topics