# Best Practices for Time Series Forecasting

We should first distinguish between recursive and non-recursive time-series models. @kforeman hinted at this distinction, which I just want to explore formally. Before we start, I want to address the numbered questions in the OP:

1. In the non-recursive case, use MutableData. In the recursive case, it might be necessary to do something more complex. I will present some solutions that work for me, but @ricardoV94 is working on some more sophisticated/generalization approaches with model cloning and reconstruction here. I won’t address any of these, but suffice to say that forcasting should be better more convenient in the future. But that’s just my forecast

2. Since the length of variables is going to change, you will need to either omit coords or re-set them after estimation. Personally I omit them at sampling time, because it’s a hassle.

3. The predictions=True flag has nothing at all to do with forecasting. AFAIK, it just makes a new group in your idata object. This allows you to, in the same object, hold posterior predictive samples for both the training and test data. If you do not set new data with pm.set_data and run pm.sample_posterior_predictive(predictions=True), you will just end up with the training distribution, stored in a different place.

On to some specifics:

## Non-Recursive Models

These are models in which time simply enters as a covariate. The model the OP posted sort of falls into this class. But actually the model he posted is more like a “cross-section of times”, since there is no time variable. I am splitting hairs here because in order to forecast, as opposed to predict, we should be able to arbitrarily extend the data out into the future. In the posted case, we need the covariate observations from the future to make forecasts, which is of course impossibe.

What I have in mind is instead more like a deterministic trend model: y_t = \alpha + \gamma t + \epsilon_t. Here, we have a time variable t, which we can arbitrarily extend out into the future because it’s just a counter. Here’s a code example:

### Example 1: Deterministic Trend

#### Data Generation

true_alpha = 1
true_gamma = 0.2
true_sigma = 1.5
T = 100

noise = rng.normal(scale=true_sigma, size=T)
t = np.arange(T)
data = true_alpha + true_gamma * t + noise
train_data = data[:90]
test_data = data[-10:]

plt.plot(train_data, color='tab:blue')
plt.plot(np.arange(90, 100), test_data, ls='--', color='tab:red')


#### Sampling

with pm.Model() as det_trend:
t_pt = pm.MutableData('t', t[:90])
alpha = pm.Normal('alpha')
gamma = pm.Normal('gamma')

mu = pm.Deterministic('mu', alpha + gamma * t_pt)
sigma = pm.Exponential('sigma', 1)

y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=train_data, shape=t_pt.shape)
idata = pm.sample()


#### Prediction and Forecasting

with det_trend:
#in-sample
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

#out-of-sample
pm.set_data({'t':t[-10:]})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)


In this case I generated the test data so I have it to plot and examine, but you can see that in principle I can extend these forecasts out forever by just setting the t variable to whatever I want, even if we don’t have any new data yet. Obviously in this example it doesn’t generate very exciting forecasts, but this methodology is actually very powerful because nothing prevents you from using non-linear functions of t. In this next example I make forecasts on real data using a prophet-like model, which uses a piecewise linear trend and Fourier seasonality to create rich time dynamics that can be arbitrarily extended into the future:

### Example 2: Prophet

Data are from Forecasting: Principles and Practice. I centered and normalized the values so I could autopilot though the prior definitions.

#### Model

First I make some functions that take in t and return some non-linear transformations:

def create_piecewise_trend(t, t_max, n_changepoints):
s = pt.linspace(0, t_max, n_changepoints+2)[1:-1]
A = (t[:, None] > s)*1

return A, s

def create_fourier_features(t, n, p=365.25):
x = 2 * np.pi * (pt.arange(n)+1) * t[:, None] / p
return pt.concatenate((pt.cos(x), pt.sin(x)), axis = 1)

def generate_features(t, t_max, n_changepoints=10, n_fourier=6, p=365.25):
A, s = create_piecewise_trend(t, t_max, n_changepoints)
X = create_fourier_features(t, n_fourier, p)

return A, s, X


Then use these in my PyMC model:

t = np.arange(df.shape[0])

# This value isn't on the computation graph (it's not computed dynamically from t_pt)
t_max = max(t)

with pm.Model() as prophet_model:
t_pt = pm.MutableData('t', t)

# We have monthly data, so p=12 leads to annual seasonality
A, s, X = generate_features(t_pt, t_max, n_changepoints=10, n_fourier=6, p=12)

initial_slope = pm.Normal('initial_slope')
initial_intercept = pm.Normal('initial_intercept')

# n_changepoint offsets terms to build the peicewise trend
deltas = pm.Normal('offset_delta', shape=(10,))

intercept = initial_intercept + ((-s * A) * deltas).sum(axis=1)
slope = initial_slope + (A * deltas).sum(axis=1)

# n_fourier * 2 seasonal coefficients
beta = pm.Normal('beta', size=12)

mu = pm.Deterministic('mu', intercept + slope * t_pt+ X @ beta)
sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=df.values.ravel(), shape=t_pt.shape[0])

idata = pm.sample()


### Forcasting

with prophet_model:
#in-sample
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

#out-of-sample, predict 3 years of home sales
last_t = t[-1]
forcast_t = np.arange(last_t, last_t + 36)
pm.set_data({'t':forcast_t})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)


Now I’m working with real data, and I didn’t set aside any test set on purpose. I want to demonstrate that you can just plug in any value for t that you want and just go hog wild predicting the future. What could possibly go wrong?

# Recursive Models

Recursive models are slightly more complex, because we are learning some policy function y_{t+1} = g(y_t). Making forecasts, then, isn’t as simple as just doing pm.set_data, but it also isn’t much more complex. Basically, we’re going to:

1. Have an unknown distribution x0 that descirbes the initial state of the system before we see data
2. From this initial x0, sample the posterior \theta s that parameterize the g function (including x0)
3. Replace x0 with a DiracDelta(y[-1]), a spike distribution over the last observed data point
4. Simulate forward using the g function

Here’s a simple example using a Guassian random walk. In this case, y_{t+1} = g(y_t) = y_t + \epsilon_t, and we just have to learn one parameter, \sigma. It’s easy to implement this model in PyMC using .cumsum():

### Generate Data

true_sigma = 0.1
T = 100

innovations = rng.normal(scale=true_sigma, size=T)
data = innovations.cumsum()
plt.plot(data)


### PyMC Model

with pm.Model() as grw:
sigma_innov = pm.Exponential('sigma_innov', 1)
innovs = pm.Normal('innovations', sigma=sigma_innov, size=100)

sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=data)



### Prediction and Forecasting

# First the easy part, in-sample prediction
with grw:
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

# Next simulate the recursive model
H = 25
simulations = np.empty((n_chains, n_draws, H))
x0 = data[-1]

simulations[:, :, 0] = x0

# simulate the forward model
for t in range(1, H):
simulations[:, :, t] = simulations[:, :, t-1] + rng.normal(scale=idata.posterior.sigma_innov) + rng.normal(scale=idata.posterior.sigma)

# Add the simulations to the idata in a "predictions" group
# Congrats, you just did a posterior predictive sampling by hand
import xarray as xr
coords={'chain':np.arange(n_chains),
'draw':np.arange(n_draws),
'y_hat_dim_2':np.arange(T, T+H)})})


Again, we have a nice funnel of increasing uncertainty, so we can be somewhat confident that nothing went wrong. We also know that the g function is basically the identity, so we should also be happy with the fact that our forecasts are a straight line out from the last data point.

## What about more complex models?

Obviously, the GRW is very simple, so doing the forward simulations by hand is really convenient. If we have a more complex model, it might be quite onerous to do this. Many time series models can be cast in linear state space form, though. For example, SARIMA, VARMAX, and exponential smoothing all admit a state-space representation. If you can write down a state-space system, it’s quite easy to do the forward simulations. You would also use this approach to run impulse response functions.

Other models, though, such has HMM or GARCH, don’t admit such a form, and it quickly becomes too much to ask to do the simulation. In that case, I suggest a hack: include missing values after your data to PyMC automatically make predictions for you. Your prediction horizon will be fixed in this case, but at least you won’t have to do anything special to get the predictions. Here’s the GRW model again using this method. The data is unchanged from the first example:

H = 25
extended_data = np.r_[data, np.full(H, np.nan)]

with pm.Model() as grw:
sigma_innov = pm.Exponential('sigma_innov', 1)
innovs = pm.Normal('innovations', sigma=sigma_innov, size=100 + H)

sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=extended_data)