Predicting out-of-sample for autoregressive models


Hi All,

Is there an easy way to make out-of-sample predictions for autoregressive models? I’m running some forecasting models with two AR components (one is seasonal) and I can’t find a simple way to generate estimates. Any help would be appreciated


Do you mean like sample_ppc? This is a good guide for that.


Thanks, @colcarroll !

It’s possible that I haven’t set up my AR coefficients correctly for these purposes, but since each new observation is based on the previous N observations, I can’t seem to forecast out with PPC, since I can’t pass Y_{t-N} to a shared variable, as it won’t have been estimated yet for the future y_t I’m trying to estimate.

In other words, if my latest in-sample time point is t_100 and the expectation for my likelihood is y_t = a + phi*y_{t-7} , then I have to estimate t_101 before I can get to t_108, which I’m not sure I can do with sample_ppc.


I dont think you can do it using sample_ppc, as AR and AR1 does not have a random method.

You will have to write a generative function, and index to the posterior samples (as trace point) to generate ppc.


I believe I have a basic working version of this. Assumes your trace has a rho and scale. This also allows for multiple observations for the same date, which is often a case I’m working with. But if you want to just have one observation per date you could just remove the date_idx parts.

 def predict_outofsample(trace, date_idx):
        trace: a pymc3 MultiTrace object
        date_idx: np.ndarray with shape (N_obs), indicating for each observation what date it corresponds to
            (so you can have multiple observations on the same day that will have the same prediction)
        samples = []
        horizon = np.max(date_idx)
        for point in enumerate(trace.points()):
            rho, scale = point['rho'], point['scale']
            thetas = [np.random.normal(loc=0, scale=scale)]
            for i in range(horizon):
                thetas.append(rho*thetas[-1] + np.random.normal(loc=0, scale=scale))
        return np.array(samples)[:, date_idx]

Non-deterministic MCMC model with updates

Alternatively you can just append nans to your data for the period you want to predict, these will be interpreted as missing values and HMC will generate a posterior predictive for them during inference. But it will be slower than your current approach.


Thanks Thomas! Also then I would need to know at time of inference how many future dates I wanted to predict, right?


Yes, that’s right.