Predicting out-of-sample for autoregressive models

Hi All,

Is there an easy way to make out-of-sample predictions for autoregressive models? I’m running some forecasting models with two AR components (one is seasonal) and I can’t find a simple way to generate estimates. Any help would be appreciated

Do you mean like sample_ppc? This is a good guide for that.

Thanks, @colcarroll !

It’s possible that I haven’t set up my AR coefficients correctly for these purposes, but since each new observation is based on the previous N observations, I can’t seem to forecast out with PPC, since I can’t pass Y_{t-N} to a shared variable, as it won’t have been estimated yet for the future y_t I’m trying to estimate.

In other words, if my latest in-sample time point is t_100 and the expectation for my likelihood is y_t = a + phi*y_{t-7} , then I have to estimate t_101 before I can get to t_108, which I’m not sure I can do with sample_ppc.

I dont think you can do it using sample_ppc, as AR and AR1 does not have a random method.

You will have to write a generative function, and index to the posterior samples (as trace point) to generate ppc.

I believe I have a basic working version of this. Assumes your trace has a rho and scale. This also allows for multiple observations for the same date, which is often a case I’m working with. But if you want to just have one observation per date you could just remove the date_idx parts.

 def predict_outofsample(trace, date_idx):
        trace: a pymc3 MultiTrace object
        date_idx: np.ndarray with shape (N_obs), indicating for each observation what date it corresponds to
            (so you can have multiple observations on the same day that will have the same prediction)
        samples = []
        horizon = np.max(date_idx)
        for point in enumerate(trace.points()):
            rho, scale = point['rho'], point['scale']
            thetas = [np.random.normal(loc=0, scale=scale)]
            for i in range(horizon):
                thetas.append(rho*thetas[-1] + np.random.normal(loc=0, scale=scale))
        return np.array(samples)[:, date_idx]

Alternatively you can just append nans to your data for the period you want to predict, these will be interpreted as missing values and HMC will generate a posterior predictive for them during inference. But it will be slower than your current approach.

Thanks Thomas! Also then I would need to know at time of inference how many future dates I wanted to predict, right?

Yes, that’s right.