The shape of the other objects could also just be (dim, dim)
, that depends on whether they are themselves time varying or not. It’s not necessary connected to having an exogenous regression component per-se.
Let’s look at some some concrete notation. Start from the general statespace definition, where all parameters are constant across time:
\begin{align}
x_t &= T x_{t-1} + c + R \varepsilon_t \\
y_t &= Z x_t + d + H \eta_t
\end{align}
To think about an exogenous regression, you have two options. First, you could think about a static regression with statespace residuals. That is, what you observe is:
\begin{align}
w_t &= \sum_{i=0}^k \beta_i W_{i,t} + \zeta_t \\
\zeta_t &= Z x_t + H \eta_t \\
x_t &= T x_{t-1} + c + R \varepsilon_t \\
\end{align}
Where \zeta_t are the residuals of the regression, and W is a matrix of exogenous features. Note that I set d = 0. We can plug \zeta_t = w_t - \sum_{i=0}^k \beta_i W_{i,t} into the 2nd equation:
w_t - \sum_{i=0}^k \beta_i W_{i,t} = Z x_t + H \eta_t
And isolate w_t:
w_t = Z x_t + \sum_{i=0}^k \beta_i W_{i,t} + H \eta_t
Via pattern matching, we see is that we could have just started with w_t = y_t and d_t = \sum_{i=0}^k \beta_i W_{i,t}, and we can get back to the base case, except that d_t is now time varying. What this would mean is that you run a regression totally outside of your statespace model, then plug in the result as the observation intercept. Not that nothing in the transition equation was touched, so all of that can remain the same.
Option 2, as you say, is to put the data W_t into the transition matrix, and include the \beta_i as part of the state vector x_t. Define \hat{x}_t = \begin{bmatrix} x_t \\ \beta \end{bmatrix}. Now you need transition dynamics for the betas – to make it simple define \hat{T} = \begin{bmatrix}T & 0 \\ 0 & I_{k,k} \end{bmatrix}. Finally give them betas no shocks, so \hat{R} = \begin{bmatrix} R \\ 0 \end{bmatrix} so the transition system is now:
\hat{x}_t = \hat{T} \hat{x}_{t-1} + \hat{R} \varepsilon_t
For the observation equation, we need to figure out what to do with y = Zx_{t}. We want to end up with y_t = Z x_t + \sum_{i=0}^k \beta_i W_{i,t} , which as you already pointed out requires that we have row W_t concatenated to the Z matrix, so \hat{Z}_t = \begin{bmatrix} Z & W_t \end{bmatrix}, so that \hat{Z}_t \hat{x}_t = \begin{bmatrix} Z & W_t \end{bmatrix} \begin{bmatrix} x_t \\ \beta \end{bmatrix} = Zx_t + W_t \beta will give us exactly the quantity we want (and, importantly, the same thing as in case 1)
In this case, Z_t becomes time-varying, while the other objects T, c, d, R, Q, H remain static (although their shapes have changed to accommodate the regression betas as new state variables).
For the record, the ExogenousRegression
component does the latter method of including the W matrix into a time-varying Z_t matrix. You can always do the first method by just setting the observation intercept manually; literally ss_mod.ssm['obs_intercept'] = W @ beta
. You won’t be able to forecast with that model using the built-in helpers though, because all of the internal set_data
machinery won’t know about the W
matrix.
Historically, people have preferred option 2 because it gives an easy way to do time varying regression – just add some innovations to the \beta states and the Kalman Filter will do the rest. From an algebraic standpoint, this is exactly identical to setting beta = pm.GaussianRandomWalk()
and doing option 1. From a sampling standpoint though, I haven’t tested it 