Best Practices for Time Series Forecasting

I’m wondering if there are some examples or docs on the canonical way to forecast time series. In particular, I’m a bit unclear on how to use coords and Data when updating covariates which are used for both inference and out of sample forecasting. Concretely, I have a simple model as follows (sudo code, full code below)

coords = {"date": train_dates}
with pm.Model() as ts_model:

    for k in coords.keys():
        ts_model.add_coord(k, coords[k], mutable=True)

    x = pm.MutableData(
        name="x",
        value=x_train,
        dims=["date"],
    )

    beta = pm.Normal("beta", mu=0, sigma=1)

    mu = pm.Deterministic(
        "mu",
        var=beta * x,
        dims="date",
    )
    sigma = pm.HalfNormal("sigma", sigma=1)

    y = pm.Normal(
        "y",
        mu=mu,
        sigma=sigma,
        observed=y_train,
        dims="date",
    )
  1. Should covariate x that is used for inference and as inputs for forecasting be MutableData or should these be two different random variables in the model DAG, i.e. x_train and x_test?
  2. How should coords be used when they need to change for in sample inference and out of sample forecasting? My impression is this is only supported outside of the context manager.
  3. What are the advantages / use cases of using of forecasting using pymc.sample_posterior_predictive(predictions=True) vs. updating the observed data via model.set_data and then using pymc.sample_posterior_predictive(predictions=False)

I’ve come across several overviews of timeseries and model predictions (linked below) but am struggling to distill out the current best practices (for reference I am using pymc v4.2.1).

This example takes the approach of using separate coords for forecasting and fitting and also uses pymc.sample_posterior_predictive(predictions=True) Forecasting with Structural AR Timeseries — PyMC example gallery

This recommends using MutableData for the observed data and then resetting the observed data for the forecast period Predict with new coords leads to conflicting sizes - #2 by lucianopaz

Here is a full example of a simple model Y ~ Normal(beta*X, sigma) where both Y and X are time series. I have set my model up to train on a training set and then forecast on an out of sample test set. The model code I have is set up as follows

import pymc as pm
import arviz as az
import numpy as np
import pandas as pd
from scipy.signal import medfilt

np.random.seed(1337)

train_dates = pd.date_range("2020-01-01", "2020-12-31")
test_dates = pd.date_range("2021-01-01", "2021-01-31")

x_train = pd.Series(np.random.randn(len(train_dates)), index=train_dates) * 3
x_test = pd.Series(np.random.randn(len(test_dates)), index=test_dates) * 3


real_beta = 0.8

y_train = real_beta * x_train + pd.Series(
    np.random.randn(len(train_dates)), index=train_dates
)
y_test = real_beta * x_test + pd.Series(
    np.random.randn(len(test_dates)), index=test_dates
)


y_test_dummy = pd.Series(0, index=test_dates)


coords = {"date": train_dates}

with pm.Model() as ts_model:

    for k in coords.keys():
        ts_model.add_coord(k, coords[k], mutable=True)

    x = pm.MutableData(
        name="x",
        value=x_train,
        dims=["date"],
    )

    beta = pm.Normal("beta", mu=0, sigma=1)

    mu = pm.Deterministic(
        "mu",
        var=beta * x,
        dims="date",
    )
    sigma = pm.HalfNormal("sigma", sigma=1)

    y = pm.Normal(
        "y",
        mu=mu,
        sigma=sigma,
        observed=y_train,
        dims="date",
    )

    # Fit to the data
    model_trace = pm.sample(
        chains=4,
        idata_kwargs={"log_likelihood": True},
    )
    model_posterior_predictive = pm.sample_posterior_predictive(trace=model_trace)


# Predict
test_coords = {"date": test_dates}

with ts_model:
    ts_model.set_data("x", x_test, coords=test_coords)
    pred_posterior_predictive = pm.sample_posterior_predictive(model_trace.posterior, predictions=True)

This gives the following outputs

az.extract_dataset(model_posterior_predictive.posterior_predictive).y.mean("sample").to_series().plot(title="Train Data")
model_posterior_predictive.observed_data.y.to_series().plot(color="k", marker="x", linestyle="None")


plt.figure()
az.extract_dataset(pred_posterior_predictive.predictions).y.mean("sample").to_series().plot(title="Test Data")
y_test.plot(color="k", marker="x", linestyle="None")

Thanks for starting this conversation, I’m very interested in it as well. I have tried using MutableData to make forecasts, and it works for simple linear models like in your example but if I include e.g. AR terms then the estimates between in-sample and out-of-sample don’t align. I’m guessing in that case it’s necessary to not just mutate the data, but also create a separate RV that forces the alignment like the Forecasting with Structural AR Timeseries example does using DiracDelta.

@jessegrabowski might be equipped to make suggestions.

We should first distinguish between recursive and non-recursive time-series models. @kforeman hinted at this distinction, which I just want to explore formally. Before we start, I want to address the numbered questions in the OP:

  1. In the non-recursive case, use MutableData. In the recursive case, it might be necessary to do something more complex. I will present some solutions that work for me, but @ricardoV94 is working on some more sophisticated/generalization approaches with model cloning and reconstruction here. I won’t address any of these, but suffice to say that forcasting should be better more convenient in the future. But that’s just my forecast :wink:

  2. Since the length of variables is going to change, you will need to either omit coords or re-set them after estimation. Personally I omit them at sampling time, because it’s a hassle.

  3. The predictions=True flag has nothing at all to do with forecasting. AFAIK, it just makes a new group in your idata object. This allows you to, in the same object, hold posterior predictive samples for both the training and test data. If you do not set new data with pm.set_data and run pm.sample_posterior_predictive(predictions=True), you will just end up with the training distribution, stored in a different place.

On to some specifics:

Non-Recursive Models

These are models in which time simply enters as a covariate. The model the OP posted sort of falls into this class. But actually the model he posted is more like a “cross-section of times”, since there is no time variable. I am splitting hairs here because in order to forecast, as opposed to predict, we should be able to arbitrarily extend the data out into the future. In the posted case, we need the covariate observations from the future to make forecasts, which is of course impossibe.

What I have in mind is instead more like a deterministic trend model: y_t = \alpha + \gamma t + \epsilon_t. Here, we have a time variable t, which we can arbitrarily extend out into the future because it’s just a counter. Here’s a code example:

Example 1: Deterministic Trend

Data Generation

true_alpha = 1
true_gamma = 0.2
true_sigma = 1.5
T = 100

noise = rng.normal(scale=true_sigma, size=T)
t = np.arange(T)
data = true_alpha + true_gamma * t + noise
train_data = data[:90]
test_data = data[-10:]

plt.plot(train_data, color='tab:blue')
plt.plot(np.arange(90, 100), test_data, ls='--', color='tab:red')

Sampling

with pm.Model() as det_trend:
    t_pt = pm.MutableData('t', t[:90])
    alpha = pm.Normal('alpha')
    gamma = pm.Normal('gamma')
    
    mu = pm.Deterministic('mu', alpha + gamma * t_pt)
    sigma = pm.Exponential('sigma', 1)
    
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=train_data, shape=t_pt.shape)
    idata = pm.sample()

Prediction and Forecasting

with det_trend:
    #in-sample 
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
    
    #out-of-sample
    pm.set_data({'t':t[-10:]})
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)

In this case I generated the test data so I have it to plot and examine, but you can see that in principle I can extend these forecasts out forever by just setting the t variable to whatever I want, even if we don’t have any new data yet. Obviously in this example it doesn’t generate very exciting forecasts, but this methodology is actually very powerful because nothing prevents you from using non-linear functions of t. In this next example I make forecasts on real data using a prophet-like model, which uses a piecewise linear trend and Fourier seasonality to create rich time dynamics that can be arbitrarily extended into the future:

Example 2: Prophet

Data are from Forecasting: Principles and Practice. I centered and normalized the values so I could autopilot though the prior definitions.

Model

First I make some functions that take in t and return some non-linear transformations:

def create_piecewise_trend(t, t_max, n_changepoints):    
    s = pt.linspace(0, t_max, n_changepoints+2)[1:-1]
    A = (t[:, None] > s)*1
    
    return A, s

def create_fourier_features(t, n, p=365.25):
    x = 2 * np.pi * (pt.arange(n)+1) * t[:, None] / p
    return pt.concatenate((pt.cos(x), pt.sin(x)), axis = 1)

def generate_features(t, t_max, n_changepoints=10, n_fourier=6, p=365.25):
    A, s = create_piecewise_trend(t, t_max, n_changepoints)
    X = create_fourier_features(t, n_fourier, p)
    
    return A, s, X

Then use these in my PyMC model:

t = np.arange(df.shape[0])

# This value isn't on the computation graph (it's not computed dynamically from `t_pt`)
t_max = max(t)

with pm.Model() as prophet_model:
    t_pt = pm.MutableData('t', t)

    # We have monthly data, so p=12 leads to annual seasonality
    A, s, X = generate_features(t_pt, t_max, n_changepoints=10, n_fourier=6, p=12)
    
    initial_slope = pm.Normal('initial_slope')
    initial_intercept = pm.Normal('initial_intercept')
    
    # n_changepoint offsets terms to build the peicewise trend
    deltas = pm.Normal('offset_delta', shape=(10,))
        
    intercept = initial_intercept + ((-s * A) * deltas).sum(axis=1)
    slope = initial_slope + (A * deltas).sum(axis=1)
    
    # n_fourier * 2 seasonal coefficients
    beta = pm.Normal('beta', size=12)
    
    mu = pm.Deterministic('mu', intercept + slope * t_pt+ X @ beta)
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=df.values.ravel(), shape=t_pt.shape[0])
    
    idata = pm.sample()

Forcasting

with prophet_model:
    #in-sample 
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
    
    #out-of-sample, predict 3 years of home sales
    last_t = t[-1]
    forcast_t = np.arange(last_t, last_t + 36)
    pm.set_data({'t':forcast_t})
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)

Now I’m working with real data, and I didn’t set aside any test set on purpose. I want to demonstrate that you can just plug in any value for t that you want and just go hog wild predicting the future. What could possibly go wrong?

Recursive Models

Recursive models are slightly more complex, because we are learning some policy function y_{t+1} = g(y_t). Making forecasts, then, isn’t as simple as just doing pm.set_data, but it also isn’t much more complex. Basically, we’re going to:

  1. Have an unknown distribution x0 that descirbes the initial state of the system before we see data
  2. From this initial x0, sample the posterior \theta s that parameterize the g function (including x0)
  3. Replace x0 with a DiracDelta(y[-1]), a spike distribution over the last observed data point
  4. Simulate forward using the g function

Here’s a simple example using a Guassian random walk. In this case, y_{t+1} = g(y_t) = y_t + \epsilon_t, and we just have to learn one parameter, \sigma. It’s easy to implement this model in PyMC using .cumsum():

Generate Data

true_sigma = 0.1
T = 100

innovations = rng.normal(scale=true_sigma, size=T)
data = innovations.cumsum()
plt.plot(data)

PyMC Model

with pm.Model() as grw:
    sigma_innov = pm.Exponential('sigma_innov', 1)
    innovs = pm.Normal('innovations', sigma=sigma_innov, size=100)
    
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=data)
    
    idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)

Prediction and Forecasting

# First the easy part, in-sample prediction
with grw:
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

# Next simulate the recursive model
H = 25
simulations = np.empty((n_chains, n_draws, H))
x0 = data[-1]

simulations[:, :, 0] = x0

# simulate the forward model
for t in range(1, H):
    simulations[:, :, t] = simulations[:, :, t-1] + rng.normal(scale=idata.posterior.sigma_innov) + rng.normal(scale=idata.posterior.sigma)

    
# Add the simulations to the idata in a "predictions" group
# Congrats, you just did a posterior predictive sampling by hand
import xarray as xr
idata.add_groups({'predictions':xr.Dataset(data_vars={'y_hat':(['chain', 'draw', 'y_hat_dim_2'], simulations)},
                                           coords={'chain':np.arange(n_chains),
                                                   'draw':np.arange(n_draws), 
                                                   'y_hat_dim_2':np.arange(T, T+H)})})

Again, we have a nice funnel of increasing uncertainty, so we can be somewhat confident that nothing went wrong. We also know that the g function is basically the identity, so we should also be happy with the fact that our forecasts are a straight line out from the last data point.

What about more complex models?

Obviously, the GRW is very simple, so doing the forward simulations by hand is really convenient. If we have a more complex model, it might be quite onerous to do this. Many time series models can be cast in linear state space form, though. For example, SARIMA, VARMAX, and exponential smoothing all admit a state-space representation. If you can write down a state-space system, it’s quite easy to do the forward simulations. You would also use this approach to run impulse response functions.

Other models, though, such has HMM or GARCH, don’t admit such a form, and it quickly becomes too much to ask to do the simulation. In that case, I suggest a hack: include missing values after your data to PyMC automatically make predictions for you. Your prediction horizon will be fixed in this case, but at least you won’t have to do anything special to get the predictions. Here’s the GRW model again using this method. The data is unchanged from the first example:

H = 25
extended_data = np.r_[data, np.full(H, np.nan)]

with pm.Model() as grw:
    sigma_innov = pm.Exponential('sigma_innov', 1)
    innovs = pm.Normal('innovations', sigma=sigma_innov, size=100 + H)
    
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=extended_data)
    
    idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

As you can see, we get (almost) the same output with less hassle. The only difference is that the forecasts begin from the last prediction, rather than the last data point (the orange line continues the blue line, rather than the red one).

Conclusion

So there you go. Some pointers on forecasting. All the code used is in this notebook, in case you’d like to look at it all gathered in one place. As I mentioned, there is new functionality that will make foreacsting with recursive models easier, it’s just that I haven’t played around with it much yet so I can’t really say anything about it.

15 Likes

This should be an example in pymc-examples.

5 Likes

Thanks @jessegrabowski, looks great!

I have also settled on just including the forecast period as missing data for now. The main limitation is that it doesn’t (easily) allow for counterfactual scenario forecasts, but otherwise it’s the simplest solution that doesn’t effectively require making every change to the model twice.

I’m not sure what you mean by counterfactual scenario forecasts – something like difference-in-difference analysis? Or impulse response functions/scenario based simulation (scroll down towards the bottom)? Actually, the series of causal inference notebooks have some nice techniques for handling something very close to forecasting. I haven’t carefully read them, but this is a nice opportunity to do so. There is also the do-operator in pymc-experimental, depending on what you’re doing.

Yep, basically something like that - generating multiple forecasts with the same model fit but different future “data”, e.g. y ~ a + B*x + GRW where I want to generate forecasts y for different future trajectories of x without refitting the model. The do operator looks incredibly useful, thanks for pointing me towards that!

Is x_t modeled jointly with y_t, or is it exogenous? If it’s exogenous, you can just use pm.set_data together with the missing value method.

1 Like

@ricardoV94 and @tcapretto just dropped a blog post on this subject that goes much, much deeper into how to handle recursive models than what I showed here. Turns out there’s no need to muck around with copying models, do-operators, or anything of the sort: PyMC can handle updates to sections of your model out of the box just fine. Everyone should definitely check it out.

5 Likes