Shape of observation intercept matrix in State Space

Hi, I have been working with the State Space module and I am confused by the shape of the observation intercept matrix. Below I am using a simple example provided on the pymc-experimental repo with the only change in k_endog set to 2.

class AutoRegressiveThree(PyMCStateSpace):
    def __init__(self):
        k_states = 3  # size of the state vector x
        k_posdef = 1  # number of shocks (size of the state covariance matrix Q)
        k_endog = 2  # number of observed states

        super().__init__(k_endog=k_endog, k_states=k_states, k_posdef=k_posdef)

    def make_symbolic_graph(self):
        # We will implement this in a moment. For now, we need to overwrite it with nothing to avoid a NotImplementedError
        # when we initialize a class instance.
        pass
ar3 = AutoRegressiveThree()
ar3.ssm.obs_intercept.shape.eval()

prints array([1, 2])

but I thought it would be (2, 1) because:

Z shape = (k_endog, k_states) = (2, 3)
x shape = (k_states, 1) = (3, 1)

Matrix Vector multiplication shape = (2,3) @ (3, 1) = (2, 1)

Why isn’t the shape of d (2, 1) instead of (1, 2)?

Am I overlooking something obvious here?

Hi @Dekermanjian !

This is an esoteric implementation detail, but it’s (poorly) documented here. The intercept object is a vector, so its shape is actually just (3,). All the objects in ssm store a batch dimension on the far left to allow for the possibility that they are time varying. For example if you look at the transition matrix ar3.ssm.transition.shape.eval() you will see the shape is also (1, 3, 3).

If you instead ask for the matrices using keys, like ar3.ssm['obs_intercept'], there is some logic to remove this batch dimension when it isn’t being using. This will give you the shape (2,) as expected. If you had a model with a time-varying obs intercept (for example an exogenous regression component), then it would not clip this first dimension.

Thank you very much for the clarification @jessegrabowski! I am asking this question exactly because I am attempting to add an exogenous regression component to the model. So if I understand you correctly then there are 2 ways to add exogenous regression components:

1). If the component is time-varying then I would put my regression component into the “obs_intercept” vector where the shape of the regression params * exogenous data = (n_obs, k_endog) AND all the other matrices need to have a defined shape = (n_obs, dim, dim)

2). if the component is non-time-varying then it is the same as above except that params is of shape (1,) instead of shape(n_obs). Thus, the resulting shape of “obs_intercept” is still (n_obs, k_endog). Will I need to define all other matrices to have shape (n_obs, dim, dim) as well in this case?

I believe that you could also put the exogenous data into the design matrix and the regression parameters into the state vector as another way to include a time-varying regression component. Would that also work in the State Space module?

The shape of the other objects could also just be (dim, dim), that depends on whether they are themselves time varying or not. It’s not necessary connected to having an exogenous regression component per-se.

Let’s look at some some concrete notation. Start from the general statespace definition, where all parameters are constant across time:

\begin{align} x_t &= T x_{t-1} + c + R \varepsilon_t \\ y_t &= Z x_t + d + H \eta_t \end{align}

To think about an exogenous regression, you have two options. First, you could think about a static regression with statespace residuals. That is, what you observe is:

\begin{align} w_t &= \sum_{i=0}^k \beta_i W_{i,t} + \zeta_t \\ \zeta_t &= Z x_t + H \eta_t \\ x_t &= T x_{t-1} + c + R \varepsilon_t \\ \end{align}

Where \zeta_t are the residuals of the regression, and W is a matrix of exogenous features. Note that I set d = 0. We can plug \zeta_t = w_t - \sum_{i=0}^k \beta_i W_{i,t} into the 2nd equation:

w_t - \sum_{i=0}^k \beta_i W_{i,t} = Z x_t + H \eta_t

And isolate w_t:

w_t = Z x_t + \sum_{i=0}^k \beta_i W_{i,t} + H \eta_t

Via pattern matching, we see is that we could have just started with w_t = y_t and d_t = \sum_{i=0}^k \beta_i W_{i,t}, and we can get back to the base case, except that d_t is now time varying. What this would mean is that you run a regression totally outside of your statespace model, then plug in the result as the observation intercept. Not that nothing in the transition equation was touched, so all of that can remain the same.

Option 2, as you say, is to put the data W_t into the transition matrix, and include the \beta_i as part of the state vector x_t. Define \hat{x}_t = \begin{bmatrix} x_t \\ \beta \end{bmatrix}. Now you need transition dynamics for the betas – to make it simple define \hat{T} = \begin{bmatrix}T & 0 \\ 0 & I_{k,k} \end{bmatrix}. Finally give them betas no shocks, so \hat{R} = \begin{bmatrix} R \\ 0 \end{bmatrix} so the transition system is now:

\hat{x}_t = \hat{T} \hat{x}_{t-1} + \hat{R} \varepsilon_t

For the observation equation, we need to figure out what to do with y = Zx_{t}. We want to end up with y_t = Z x_t + \sum_{i=0}^k \beta_i W_{i,t} , which as you already pointed out requires that we have row W_t concatenated to the Z matrix, so \hat{Z}_t = \begin{bmatrix} Z & W_t \end{bmatrix}, so that \hat{Z}_t \hat{x}_t = \begin{bmatrix} Z & W_t \end{bmatrix} \begin{bmatrix} x_t \\ \beta \end{bmatrix} = Zx_t + W_t \beta will give us exactly the quantity we want (and, importantly, the same thing as in case 1)

In this case, Z_t becomes time-varying, while the other objects T, c, d, R, Q, H remain static (although their shapes have changed to accommodate the regression betas as new state variables).

For the record, the ExogenousRegression component does the latter method of including the W matrix into a time-varying Z_t matrix. You can always do the first method by just setting the observation intercept manually; literally ss_mod.ssm['obs_intercept'] = W @ beta. You won’t be able to forecast with that model using the built-in helpers though, because all of the internal set_data machinery won’t know about the W matrix.

Historically, people have preferred option 2 because it gives an easy way to do time varying regression – just add some innovations to the \beta states and the Kalman Filter will do the rest. From an algebraic standpoint, this is exactly identical to setting beta = pm.GaussianRandomWalk() and doing option 1. From a sampling standpoint though, I haven’t tested it :slight_smile:

@jessegrabowski Thank you! This has concretely clarified so many things for me.

Jesse, can you point me to some documentation on using the ExogenousRegression component?

I would like to use the built-in helpers for getting forecasts. I notice that I need to set the properties data_name() and data_info() and I need to register the data with make_and_register_data but I am running into shape errors with AssertionError: SpecifyShape: dim 0 of input has shape 1001, expected 1. Where 1001 is the number of observations.

There are examples here and here.

This discussion made me realize that the API docs for the ExogenousRegression component are missing, so I’ll open a PR to fix that.

1 Like

@jessegrabowski Thank you so much!!