Sample_posterior_predicitve not catching shape of new data

Your predictions don’t look right. In a time series model you should never have wiggles and dynamics in the average prediction unless you built them in, i.e. with seasonal components or AR terms. For a GRW, your best guess of tomorrow’s value is today’s value, so your predictions should be a straight line out from the last observed value, with increasing “cup shaped” variance as you move out from the observations in time (variance should be proportional to \sqrt{T + h})

To see this, here’s some math. The GRW model can be written:

y_{t+1} = y_t + \epsilon_{t+1}, \quad \epsilon_t \sim N(\mu, \sigma)

Important points:

  1. This is a so-called “Markov Process”, meaning that including information about past values does not improve your ability to forecast future values. Formally, \mathbb E [y_{t+1} | \{y_\tau\}_{\tau=0}^t] = \mathbb E [y_{t+1} | y_t], i.e. all that extra info is useless.
  2. The mean future prediction, then, is simply \mathbb E_t [y_{t+1}] = \mathbb E_t [y_t + \epsilon_{t+1}] = y_t + \mathbb E_t[\epsilon_{t+1}] = y_t + \mu
  3. Similarly, the forecast variance is just Var_t (y_{t+1}) = Var_t (y_t + \epsilon_t) = Var(\epsilon_t) = \sigma^2)
  4. By the “Law of Iterated Expectations”, all your best guesses about future values will boil down to only your knowledge today. It means that if we predict for time t+2, we’ll end up with \mathbb E_t [y_{t+2}] = \mathbb E_t [\mathbb E_t [y_{t+1}] + \epsilon_{t+2}] = \mathbb E_t[\mathbb E_t [y_t + \epsilon_{t+1}] + \epsilon_{t+2}] = y_t + \mathbb E_t [\mathbb E_t [\epsilon_{t+1}] + \epsilon_{t+2}]. The important point that the double expectation washes out, so you get \mathbb E_t [y_{t+2}] = y_t + 2\mu.

This generalizes to E_t [y_{t+h}] = y_t + h\mu, so your predicted mean should be the last observed value, plus a deterministic trend with slope \mu (0 in your case).

You can do the same thing for variance: Var_t (y_{t+2}) = Var_t(y_{t+1} + \epsilon_{t+2}) = Var_t(y_t + \epsilon_{t+1} + \epsilon_{t+2} The only important thing to know here is that variance is quadratic, so you can distribute it over addition, but little covariance babies pop out, as in Var(a + b) = Var(a) + Var(b) + 2 Cov(a, b). I appeal to Cov(\epsilon_t, \epsilon_s) = 0, \quad \forall t, s to ignore these, and you end up with: Var_t (y_{t+2}) = 2\sigma^2. So in general, the standard deviation of your prediction for y_{T+h} will be \sqrt{h} \cdot \sigma.

So far I’ve said nothing helpful about making predictions with your PyMC model… sorry about that. Basically, you want to sample from the posterior distribution over w[-1] and ϵ, construct a normal distribution centered on w[-1] with standard error ϵ, sample h values from this normal (where h is your forecast horizon) then take the cumsum to get your predictions. Do this a lot of times and you’ll get a nice cup shaped forecast distribution.

You can also skip the hassle and just use the formulas derived above.

Here’s the first approach applied to the first time series in score, starting after sampling the model you posted:

T = 50
h = 20
post = az.extract_dataset(trace)

# Note that the last prediction is *not* included in the cumsum, it is "x0", not the drift! 
forecasts = preds.stack(sample = ['chain', 'draw']).values[0, -1, :] + (post['ϵ'].values[None, :] * rng.normal(size=(h, 2000))).cumsum(axis=0)

fig, ax = plt.subplots(figsize=(14,4), dpi=77)
ax.plot(preds.stack(sample = ['chain', 'draw'])[0, :, :], color='0.5', alpha=0.05)
ax.plot(preds.stack(sample = ['chain', 'draw']).mean(dim='sample').values[0, :], color='tab:blue')
ax.plot(score[0, :], color='k')

ax.plot(np.arange(T, T + h), forecasts, color='tab:green', alpha=0.1)
ax.plot(np.arange(T, T + h), forecasts.mean(axis=-1), color='k', ls='--')
plt.show()

As a general comment, you can always make out-of-sample predictions by just re-implementing your model in Numpy, using the posterior samples together with whatever data you wish. In this case it’s a bit opaque how the pieces fit together, but that’s all we’re doing.

3 Likes