Using n-days forward prediction in Bayesian (Hierarchical) VAR model example

Hello all,

In this example blog post on the Hierarchical VAR model - is there anyway to modify the model so that we can predict y_{t} from inputs y_{t-n},y_{t-n-1},... (for say n=30) instead of just one day forward i.e. inputs y_{t-1}, y_{t-2}, ... ?

Thanks in advance!

I’m not totally sure what you mean. If you just want the conditional forecast y_{t+1} | \{y_{t-s}\}_{s=0}^n, that will be governed by the data you show the model when you sample the posterior parameters. If you use 30 days of data, the one-step out-of-sample forecast will be that conditional distribution. If you mean that you want to use 30 days of data to make the t+1-th prediction, i.e. y_t = f(y_{t-1}, .., y_{t-30}), you will have to set the number of lags to 30. Note that in this case, you’ll have a lot of trouble sampling, because the probability of sampling stationary and invertible coefficient matrices goes to zero quite quickly, see here.

Thanks for gettting back. This is purely an implementation query - rather than AR of lag p, of the form:

y_t = \alpha + B_1y_{t-1} + B_2y_{t-2}+\dots+B_py_{t-p}

I’d like the code to allow me to start the lag at an earlier time point n - days - (probably via the function calc_ar_step most likely ).I.e. for the above straightforwardly n=1, but I want the implemented code to be for any n
y_t = \alpha + B_1y_{t-n}+B_2y_{t-n-1}+\dots+B_py_{t-n-p}.

To be abundantly clear, for n=30, say, I want the code to allow me to fit the model:

y_t = \alpha + B_1y{t-30}+B_2y_{t-31}+\dots+B_py_{t-30-p}.

Hope that makes sense.

You are right, the lagging happens in the calc_ar_step function. You need to change this line:

        ar = pm.math.sum(
            [
                pm.math.sum(lag_coefs[j, i] * df.values[n_lags - (i + 1) : -(i + 1)], axis=-1)
                for i in range(n_lags)
            ],
            axis=0,
        )

To account for the additional shifting that you want. I’d have to play with it to figure it out exactly, but I think you want something like df.values[30 + n_lags - (i + 1) : -(i + 1)]. That would align the 30th value with the first value in the design matrix, then lag from there.

Unfortunately, it seems like df is fed into that function in a non-intuitive way -it may be done incrementally - since for i = 0, the indices of the slice is 1 and -1. Is there any way you can have a quick play around and figure out a way? Apologies as I would do it on my own but I am very new to pymc and need something fairly quickly to test out - and debugging/writing new pymc code seems a bit esoteric to be at the moment haha.

Btw - thanks for the link to your blog/articles - currently working on Bayesian time series model so will take a look through them some point in the near future!