The statespace package is always working with a formula like this:
\begin{align}
y_t &=Z x_t + d + \eta_t, & \eta_t \sim N(0, H) \\
x_t &= T x_{t-1} + c + R \varepsilon_t, & \varepsilon \sim N(0, Q)
\end{align}
Looking at this formula from the statsmodel docs:
Ay_t = A_1 y_{t-1} + \ldots + A_p y_{t-p} + B \varepsilon_t, \quad \varepsilon_t \sim N(0, I_k)
I write I_k because we assume y_t has shape k \times 1 (there are k endogenous variables) and we’re working with a VAR(p, 0) model. I assume the errors are spherical in statsmodels (that is anything we know about the covariance is pre-factored out into B), but I actually don’t know – that’s an important detail to check.
Anyway, to get from one to the other, you’re going to have to do some rewriting:
y_t = A^{-1} A_1 y_{t-1} + \dots + A^{-1} A_p y_{t-p} + A^{-1} B \varepsilon_t
Call A^{-1} A_p = \Gamma_p, so the system is y_t = \sum_{s=1}^p \Gamma_p y_{t-p} + A^{-1} B \varepsilon_t
In the statespace framework, you work with everything “stacked” into a single state vector. So define:
x_t = \begin{bmatrix} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p} \end{bmatrix}
Next you need a map from \{\Gamma_1, \dots, \Gamma_p\} to T. This is done with vectorization. Define \text{rvec}(X) as a “row vectorize” operator that stacks the rows of a matrix into a single row vector from top to bottom (the usual \text{vec} operator works on columns, so I’m try to avoid confusion)
<BEGIN EDIT>
The formula I had before for the T matrix was wrong. I should have taken my advice and looked at the digression. Here’s an example of T for a VAR(2, 0). The indices are in the form (equation, coefficient, lag order):
\begin{bmatrix} a_{x,x,1} & a_{x,y,1} & a_{x,z,1} & a_{x,x,2} & a_{x,y,2} & a_{x,z,2} \\
a_{y,x,1} & a_{y,y,1} & a_{y,z,1} & a_{y,x,2} & a_{y,y,2} & a_{y,z,2} \\
a_{z,x,1} & a_{z,y,1} & a_{z,z,1} & a_{z,x,2} & a_{z,y,2} & a_{z,z,2} \\
1 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 \end{bmatrix}
You see there there are 4 square blocks. On top there’s one for the lag 1 coefficients, with equations on the rows and coefficients on the columns, then the same for lag 2. Then there’s a k \times k identity matrix, and a k \times k zero matrix. So in block form, we have:
T = \begin{bmatrix} \Gamma_1 & \Gamma_2 \\ I_k & 0_k \end{bmatrix}
Continuing the pattern, for VAR(3) one has:
T = \begin{bmatrix} \Gamma_1 & \Gamma_2 & \Gamma_3 \\ I_k & 0_k & 0_k \\ 0_k & I_k & 0_k \end{bmatrix}
And so on.
<END EDIT>
There’s a section in the linked notebook called " Digression: A note on representation" that would be useful to look at now, to see how this would look in a VAR(1,0) case with 3 endogenous variables.
Now we just need R and Q. \varepsilon_t isn’t going to change. So the job of R is to make it conform in shape to the new state vector x_t. Nothing in x_t gets shocks except the first k entries (the ones associated with t-1, so R is going to have a lot of zeros. One choice is to define R = \begin{bmatrix} A^{-1} B \\ 0 \\ \vdots \\ 0 \end{bmatrix}, which implies that Q = I_k.
Alternatively, you could decide to define R = \begin{bmatrix} I_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}, in which case you’re now forced to pick Q = A^{-1} B B^T A^{-T} (that follows by analolgy from multiplying a scalar random variable a \sim N(0, 1) by a scalar b gives a \cdot b \sim N(0, b^2)
With that we’re basically done. Define Z = \begin{bmatrix} I_k & 0 & \dots & 0 \end{bmatrix} (it just picks out the top k elements of the x_t vector), and we have everything in statespace form.
From here, you have two choices:
Option 1 is to just use the pre-defined BayesianVARMA
model directly. This will force you to go with the covariance definition of R and Q, because it internally sets R to the one with the I_k on top.
a. Going with the example you gave, you need 6 coefficients in the A matrix, Sample those, then arrange them into a lower triangular matrix.
b. Next, sample your VAR coefficient matrices. Then construct the \Gamma matrices, and finally ravel, concatenate, and reshape into the form expected by the model
c. Finally, do the same for the covariance, computing A^{-1} B B^T A^{-T}
This model would look something like this (priors are dumb on purpose):
from pymc_extras.statespace import BayesianVARMAX
import pymc as pm
import pytensor.tensor as pt
import numpy as np
mod = BayesianVARMAX(order=(2, 0),
endog_names=['x', 'y', 'z'],
stationary_initialization=True)
k = mod.k_posdef
A_mask = np.array([[1, 0, 0],
[1, 1, 0],
[1, 1, 1]])
B_mask = np.eye(3)
n_nonzero_A = A_mask.ravel().sum()
n_nonzero_B = B_mask.ravel().sum()
with pm.Model(coords=mod.coords) as m:
A_coefs = pm.Normal('A_coefs', 0, 1, shape=(n_nonzero_A, ))
B_coefs = pm.Normal('B_coefs', 0, 1, shape=(n_nonzero_B, ))
# .set is necessary in pytensor to allocate a value inside a matrix. Here, we use A_mask
# and B_mask to do fancy indexing
A = pm.Deterministic('A',
pt.zeros((k, k))[A_mask.astype(bool)].set(A_coefs),
dims=['observed_state', 'observed_state_aux'])
B = pm.Deterministic('B',
pt.zeros((k, k))[B_mask.astype(bool)].set(B_coefs),
dims=['observed_state', 'observed_state_aux'])
# According to the build report, ar_lags needs to be shape ['observed_state', 'ar_lags', 'observed_state_aux'].
# But we need compute A_inv @ A_i for each lag, so first we use ar_lag as a "batch dim",
# then we'll fix it later.
lag_coefs = pm.Normal('lag_matrices', 0, 1, dims=['ar_lag', 'observed_state', 'observed_state_aux'])
# Pytensor will automatically vectorize solve across the left-most dimension. We have to tell it
# that b has 2 dims, otherwise it will solve A against column vectors with 2 batch dims, which
# isn't what we want.
Gammas = pt.linalg.solve(A, lag_coefs, b_ndim=2, assume_a='gen')
# Now that we have Gammas, we can use swapaxes to put the dims into the right order.
# Here, the name of the variable is extremely important -- this is what will actually go into the T matrix.
ar_params = pm.Deterministic('ar_params', pt.swapaxes(Gammas, 0, 1), dims=['observed_state', 'ar_lag', 'observed_state_aux'])
# Similarly, we want to build to covariance matrix, then give it the right name in a deterministic
# First, get A^-1 @ B
A_inv_B = pt.linalg.solve(A, B, assume_a='gen')
# Then "square" it and put it in a deterministic with the right name.
state_cov = pm.Deterministic('state_cov', A_inv_B @ A_inv_B.T, dims=['shock', 'shock_aux'])
# Using all nan data will draw unconditional trajectories based on the prior (the kalman filter, having nothing to learn from, will just do nothing)
mod.build_statespace_graph(np.full((100, 3), np.nan),
mode='JAX')
# This model is a mess -- the priors are really nasty and you'll have a hard time sampling numerically stable trajectories
# JAX has somewhat more stable linalg routines, so we want to use it to do forward sampling too.
with freeze_dims_and_data(m):
prior = pm.sample_prior_predictive(compile_kwargs={'mode':'JAX'})
Option 2 is to make a custom statespace model to handle all of these reshapes and solves for you, and ideally open a PR to include it for others to use in the future! Explanation here, and some conversations here.