Predicting out-of-sample for autoregressive models

Hi All,

Is there an easy way to make out-of-sample predictions for autoregressive models? I’m running some forecasting models with two AR components (one is seasonal) and I can’t find a simple way to generate estimates. Any help would be appreciated

Do you mean like `sample_ppc`? This is a good guide for that.

Thanks, @colcarroll !

It’s possible that I haven’t set up my AR coefficients correctly for these purposes, but since each new observation is based on the previous N observations, I can’t seem to forecast out with PPC, since I can’t pass Y_{t-N} to a shared variable, as it won’t have been estimated yet for the future y_t I’m trying to estimate.

In other words, if my latest in-sample time point is t_100 and the expectation for my likelihood is y_t = a + phi*y_{t-7} , then I have to estimate t_101 before I can get to t_108, which I’m not sure I can do with sample_ppc.

I dont think you can do it using sample_ppc, as `AR` and `AR1` does not have a random method.

You will have to write a generative function, and index to the posterior samples (as trace `point`) to generate ppc.

I believe I have a basic working version of this. Assumes your trace has a `rho` and `scale`. This also allows for multiple observations for the same date, which is often a case I’m working with. But if you want to just have one observation per date you could just remove the `date_idx` parts.

`````` def predict_outofsample(trace, date_idx):
"""
trace: a pymc3 MultiTrace object
date_idx: np.ndarray with shape (N_obs), indicating for each observation what date it corresponds to
(so you can have multiple observations on the same day that will have the same prediction)
"""
samples = []
horizon = np.max(date_idx)
for point in enumerate(trace.points()):
rho, scale = point['rho'], point['scale']
thetas = [np.random.normal(loc=0, scale=scale)]
for i in range(horizon):
thetas.append(rho*thetas[-1] + np.random.normal(loc=0, scale=scale))
samples.append(thetas)
return np.array(samples)[:, date_idx]
``````

Alternatively you can just append `nan`s to your data for the period you want to predict, these will be interpreted as missing values and HMC will generate a posterior predictive for them during inference. But it will be slower than your current approach.

Thanks Thomas! Also then I would need to know at time of inference how many future dates I wanted to predict, right?

Yes, that’s right.