Online update as new data comes in

Can I use with command for online update as new data comes in? Specifically, suppose I train a model model with some initial data, and then to continue training on new data, can I do something like :

with model : 
#Set new data by pm.set_data({"data" : ...})
    pm.sample(**KWARGS)

The reason I am asking is that no online thread on sequential or online update as new data comes in mentions this seemingly simple method. This piece of code can be wrapped in a function and be called whenever new data is generated and fitted on it. Let me know your thoughts.

I think it depends critically on what you mean by “continue”. Your approach will generate fresh estimates based on the data you supply. But any prior sampling will be ignored. So if you are augmenting your data by adding the newest time period to an ever-expanding data set, then your new estimates would reflect all of your data, but you would still be starting your sampling from scratch. Alternatively, if you are replacing yesterday’s data with today’s, then your estimates will only reflect that newest data. If you want to update incrementally, you should take a look here.

Thank you for your response. The scenario is exactly the same as described in the link, however, the only problem in the documentation is that they are considering all the RVs independent, which is not usually the case, and the interpolate wont work for multivariate priors (at least I couldn’t find it).
Let me reframe the problem :- if I use the with context, will the model be trained from scratch on the new data, or the model parameters that have been learnt from the previously available data will be reflected as prior and be updated from the new data?

1 Like

This. In your model you specify priors. When you call

idata=pm.sample(return_inferencedata=True)

you you get samples from the posterior (they are stored in idata.posterior). When you reinstantiate the model context (i.e., with my_model:), your model still retains the priors you specified when you originally specified your model. The model context never “knows” about the posterior samples.

1 Like

Hello @Chandan_Gupta, I am working on a Bayesian network that utilizes new observations as they come. The probem @cluhmann describes is that you initialize the model with priors, and as they wrote:

The model context never “knows” about the posterior samples.

Let’s say that your system has 2 measurements available in a series.

First measurement

you update the model with the measurement via the observed= keyword in the definition of the probality distribution function(e.g.
pm.Normal(..., observed=np.array([your_observation_data])). You get the posterior from the return_inferencedata=True flag in the pm.sample but now you need to put the posterior into your next prior.

Second measurement

To utilize the previous measurement, you need to use the posterior as a new prior, but the with context only contains the original priors. So to use the previous measurement you can reinitialize the model with new parameters. I solve this problem with having a pm.Dirichlet() distribution over other distributions which are for example Mixtures taking weights as a paramter → pm.NormalMixture()s. The Dirichlet distribution takes in alpha= keyword, which is the vector forming the probabilities in the K-1 simplex (triangle generalized to higher dimensions). The alpha vector can be roughly initialized directly with the probabilities or the vector can be fit with a hierarchical model for better precision. I have shown this here.

Although this reply is kinda specific, I hope the general problem of re-initializing the distribution got through. Many seemingly difficult problems can be solved using hierarchical modelling, where you model the hyperpriors of the distributions instead of the distributions directly.