Combining pre-fit model with new observations and forecasting in PyMC

bernardocaldas · March 21, 2025, 10:50am

I’m looking for guidance on the proper approach in PyMC to:

Use a model fit on historical data
Update it with new observations
Generate forecasts for future dates

Election Forecasting Context

I’m building an election forecasting model (as described here) with these components:

Historical Data: Past elections with polling data, results, and other covariates
Current Election: Real polling data is arriving periodically before the election
Forecast Need: Predict party vote shares up to election day

My model captures polling dynamics including:

Time-varying party support
Pollster house effects
Gaussian process components for temporal dynamics
Other factors like incumbency

Current Challenge

I’ve successfully:

Fit the model using historical election data
Obtained posterior distributions for all parameters

Now I need to:

Incorporate new polls for the current election as they become available
Generate forecasts for future dates up to the election
Properly quantify uncertainty that reflects:

High certainty at dates with real polls
Increasing uncertainty as we move away from observed polls

Specific Questions

What’s the proper way in PyMC to incorporate new observations (polls) when making forecasts with a pre-fit model?
Should I create a separate “forecast model” that uses the posterior from my training model as priors?
How do I ensure that my real polls properly constrain the forecast uncertainty (so that uncertainty is minimal at real poll dates and grows as we move away from them)?
Is there a standard approach in PyMC for this kind of “update with new data + forecast” problem?

Any examples or guidance would be greatly appreciated. I’m specifically interested in understanding the right architectural approach, rather than specific implementation details.

cc @AlexAndorra @ricardoV94 @awalters who have helped me in the past

iavicenna · March 23, 2025, 2:18pm

There is some discussion here about how to use posteriors from one model as priors in the next:

though I don’t know what the best practice is but this topic has some detailed discussion and references to literature. It also seems there is already some implementations of what was discussed there in the pymc_experimental lib:

AlexAndorra · March 23, 2025, 4:21pm

Hi @bernardocaldas , and well done on the new model!
Unless you’re using state space models for the time series part, the workflow shouldn’t be any different than a classic PyMC model, so you should be able to use the set_data function.

There are numerous examples of it on the website – basically, any example that shows how to do out-of-sample predictions should help you.
In particular, if you’re still using my electoral forecasting models as a basis, I had written about it on my blog. The recent HSGP tutorials I co-wrote with Bill show it too.

Hope this helps, and PyMCheers

bernardocaldas · March 23, 2025, 6:39pm

Hey @AlexAndorra !

I guess the main question is how to add additional observations. Does replacing the observed_polls data with the recently added polls + doing posterior sampling do anything to condition the posterior to the new polls?

bob-carpenter · March 25, 2025, 7:54pm

This is not how polls work. We have very high uncertainty even on poll day due to several factors, including simple sampling variance (we only measure a tiny fraction of the population, so the result is high uncertainty; there is differential non-response (who responds to a poll depends on what’s been going on; there is often uncertainty in our attempts to adjust non-representative polls to the general population).

This is usually a side-effect of a time-series model. Sounds like you’re using a GP, where this will happen naturally. You’ll have to be careful to calibrate the covariance kernel to one that makes sense for this.

How do you fit that effect and adjust for the differences among polls?

The bigger effect is differential non-response.

You could use something like Sequential Monte Carlo (SMC), or you could just refit the whole model. For local drifts, you can use importance sampling like in LOO.

Don’t do that artificially. And make sure to account for the very high uncertainty of the data from a given poll.

If you can write a single model that can accommodate any number of polls, it shouldn’t be a problem updating it and refitting it as new data come in.

AlexAndorra · March 27, 2025, 1:41am

Ah if you want to update the posterior parameters, then you need to re-sample the model, and the easiest programatically will be to just do it on the full data, including the new polls.

If you just swap in the new polls instead of the old polls you used to sample, then do sample_posterior_predictive, this will give you predictions of course, but they will be conditioned on parameters learned during sampling with old polls.

As Bob was saying, the GP will do that automatically

Topic		Replies	Views
Using pm.Data to predict on two inputs for sample_posterior_predictive; why is there no change in the results? Questions	4	1160	May 17, 2021
Non-deterministic MCMC model with updates Questions	1	701	December 6, 2018
Best Practices for Time Series Forecasting version agnostic	11	3708	August 15, 2024
Probabilistic forecast of European elections in France Sharing	8	961	April 16, 2019
Model setup for multiple events Development development , modeling	6	58	October 15, 2024

Combining pre-fit model with new observations and forecasting in PyMC

Election Forecasting Context

Current Challenge

Specific Questions

Related topics