Your predictions don’t look right. In a time series model you should never have wiggles and dynamics in the average prediction unless you built them in, i.e. with seasonal components or AR terms. For a GRW, your best guess of tomorrow’s value is today’s value, so your predictions should be a straight line out from the last observed value, with increasing “cup shaped” variance as you move out from the observations in time (variance should be proportional to \sqrt{T + h})
To see this, here’s some math. The GRW model can be written:
y_{t+1} = y_t + \epsilon_{t+1}, \quad \epsilon_t \sim N(\mu, \sigma)
Important points:
- This is a so-called “Markov Process”, meaning that including information about past values does not improve your ability to forecast future values. Formally, \mathbb E [y_{t+1} | \{y_\tau\}_{\tau=0}^t] = \mathbb E [y_{t+1} | y_t], i.e. all that extra info is useless.
- The mean future prediction, then, is simply \mathbb E_t [y_{t+1}] = \mathbb E_t [y_t + \epsilon_{t+1}] = y_t + \mathbb E_t[\epsilon_{t+1}] = y_t + \mu
- Similarly, the forecast variance is just Var_t (y_{t+1}) = Var_t (y_t + \epsilon_t) = Var(\epsilon_t) = \sigma^2)
- By the “Law of Iterated Expectations”, all your best guesses about future values will boil down to only your knowledge today. It means that if we predict for time t+2, we’ll end up with \mathbb E_t [y_{t+2}] = \mathbb E_t [\mathbb E_t [y_{t+1}] + \epsilon_{t+2}] = \mathbb E_t[\mathbb E_t [y_t + \epsilon_{t+1}] + \epsilon_{t+2}] = y_t + \mathbb E_t [\mathbb E_t [\epsilon_{t+1}] + \epsilon_{t+2}]. The important point that the double expectation washes out, so you get \mathbb E_t [y_{t+2}] = y_t + 2\mu.
This generalizes to E_t [y_{t+h}] = y_t + h\mu, so your predicted mean should be the last observed value, plus a deterministic trend with slope \mu (0 in your case).
You can do the same thing for variance: Var_t (y_{t+2}) = Var_t(y_{t+1} + \epsilon_{t+2}) = Var_t(y_t + \epsilon_{t+1} + \epsilon_{t+2} The only important thing to know here is that variance is quadratic, so you can distribute it over addition, but little covariance babies pop out, as in Var(a + b) = Var(a) + Var(b) + 2 Cov(a, b). I appeal to Cov(\epsilon_t, \epsilon_s) = 0, \quad \forall t, s to ignore these, and you end up with: Var_t (y_{t+2}) = 2\sigma^2. So in general, the standard deviation of your prediction for y_{T+h} will be \sqrt{h} \cdot \sigma.
So far I’ve said nothing helpful about making predictions with your PyMC model… sorry about that. Basically, you want to sample from the posterior distribution over w[-1]
and ϵ
, construct a normal distribution centered on w[-1]
with standard error ϵ
, sample h values from this normal (where h is your forecast horizon) then take the cumsum to get your predictions. Do this a lot of times and you’ll get a nice cup shaped forecast distribution.
You can also skip the hassle and just use the formulas derived above.
Here’s the first approach applied to the first time series in score
, starting after sampling the model you posted:
T = 50
h = 20
post = az.extract_dataset(trace)
# Note that the last prediction is *not* included in the cumsum, it is "x0", not the drift!
forecasts = preds.stack(sample = ['chain', 'draw']).values[0, -1, :] + (post['ϵ'].values[None, :] * rng.normal(size=(h, 2000))).cumsum(axis=0)
fig, ax = plt.subplots(figsize=(14,4), dpi=77)
ax.plot(preds.stack(sample = ['chain', 'draw'])[0, :, :], color='0.5', alpha=0.05)
ax.plot(preds.stack(sample = ['chain', 'draw']).mean(dim='sample').values[0, :], color='tab:blue')
ax.plot(score[0, :], color='k')
ax.plot(np.arange(T, T + h), forecasts, color='tab:green', alpha=0.1)
ax.plot(np.arange(T, T + h), forecasts.mean(axis=-1), color='k', ls='--')
plt.show()
As a general comment, you can always make out-of-sample predictions by just re-implementing your model in Numpy, using the posterior samples together with whatever data you wish. In this case it’s a bit opaque how the pieces fit together, but that’s all we’re doing.