Structural Bayesian Vector Autoregression

Really enjoyed the posts by, eg, @ricardoV94 on BVARs.

I’m interested in creating a simple structural bayesian vector autoregression using PyMC. These are like VAR(p) models but have a number of potential restrictions imposed:

  • sign restrictions
  • long run restrictions
  • theoretically informed or narrative restrictions

In an economic context, that last set of restrictions might include rules that:

  • Only shocks to output can shift output contemporaneously.
  • Only shocks to output and inflation can shift inflation contemporaneously.
  • All shocks can affect the interest rate contemporaneously.

A good example of that combines all of this functionality can be found in the R packages bsvarSIGNs. There’s also a neat worked through example here.

The closest example I’ve been able to find in the PyMC literature is this BVAR page but it doesn’t have any ‘structural’ restrictions in it. Statsmodels supports SVARS too; the page is here.

Is it possible to implement SBVARs in PyMC? And how would one do it?

You can use the BayesianVARMA model in pymc_extras.statespace. Example notebooks here and here, and I also talk about them in this video.

You have full control over the coefficient matrix and covariance matrix when doing inference (so you can impose zeros in either), and over the covariance matrix used when doing impulse response function. There’s an option to do cholesky decomposition on the covariance matrix out of the box, but it needs a bit of work. PRs welcome :slight_smile:

3 Likes

Super helpful, thank you (and for the very prompt reply!)

So, if I’ve understood correctly, to implement the equivalent of the a.mat and b.mat from this example:

# a.mat
##      [,1] [,2] [,3]
## [1,]   NA    0    0
## [2,]   NA   NA    0
## [3,]   NA   NA   NA
# b.mat
##      [,1] [,2] [,3]
## [1,]   NA    0    0
## [2,]    0   NA    0
## [3,]    0    0   NA

The lines in your first example one would have to change would be:

    state_chol, _, _ = pm.LKJCholeskyCov(
        "state_chol", eta=1, n=bvar_mod.k_posdef, sd_dist=pm.Exponential.dist(lam=1)
    )

    ar_params = pm.Normal("ar_params", mu=0, sigma=1, dims=ar_dims)
    state_cov = pm.Deterministic("state_cov", state_chol @ state_chol.T, dims=state_cov_dims)

Am I right in saying that a.mat is ar_params and b.mat is state_cov in your example? And apologies if this is a stupid question, but how do you modify the pm.Normal and pm.Deterministic lines to implement matrix structures with zeros in the right places?

The statespace package is always working with a formula like this:

\begin{align} y_t &=Z x_t + d + \eta_t, & \eta_t \sim N(0, H) \\ x_t &= T x_{t-1} + c + R \varepsilon_t, & \varepsilon \sim N(0, Q) \end{align}

Looking at this formula from the statsmodel docs:

Ay_t = A_1 y_{t-1} + \ldots + A_p y_{t-p} + B \varepsilon_t, \quad \varepsilon_t \sim N(0, I_k)

I write I_k because we assume y_t has shape k \times 1 (there are k endogenous variables) and we’re working with a VAR(p, 0) model. I assume the errors are spherical in statsmodels (that is anything we know about the covariance is pre-factored out into B), but I actually don’t know – that’s an important detail to check.

Anyway, to get from one to the other, you’re going to have to do some rewriting:

y_t = A^{-1} A_1 y_{t-1} + \dots + A^{-1} A_p y_{t-p} + A^{-1} B \varepsilon_t

Call A^{-1} A_p = \Gamma_p, so the system is y_t = \sum_{s=1}^p \Gamma_p y_{t-p} + A^{-1} B \varepsilon_t

In the statespace framework, you work with everything “stacked” into a single state vector. So define:

x_t = \begin{bmatrix} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p} \end{bmatrix}

Next you need a map from \{\Gamma_1, \dots, \Gamma_p\} to T. This is done with vectorization. Define \text{rvec}(X) as a “row vectorize” operator that stacks the rows of a matrix into a single row vector from top to bottom (the usual \text{vec} operator works on columns, so I’m try to avoid confusion)

<BEGIN EDIT>

The formula I had before for the T matrix was wrong. I should have taken my advice and looked at the digression. Here’s an example of T for a VAR(2, 0). The indices are in the form (equation, coefficient, lag order):

\begin{bmatrix} a_{x,x,1} & a_{x,y,1} & a_{x,z,1} & a_{x,x,2} & a_{x,y,2} & a_{x,z,2} \\ a_{y,x,1} & a_{y,y,1} & a_{y,z,1} & a_{y,x,2} & a_{y,y,2} & a_{y,z,2} \\ a_{z,x,1} & a_{z,y,1} & a_{z,z,1} & a_{z,x,2} & a_{z,y,2} & a_{z,z,2} \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \end{bmatrix}

You see there there are 4 square blocks. On top there’s one for the lag 1 coefficients, with equations on the rows and coefficients on the columns, then the same for lag 2. Then there’s a k \times k identity matrix, and a k \times k zero matrix. So in block form, we have:

T = \begin{bmatrix} \Gamma_1 & \Gamma_2 \\ I_k & 0_k \end{bmatrix}

Continuing the pattern, for VAR(3) one has:

T = \begin{bmatrix} \Gamma_1 & \Gamma_2 & \Gamma_3 \\ I_k & 0_k & 0_k \\ 0_k & I_k & 0_k \end{bmatrix}

And so on.

<END EDIT>

There’s a section in the linked notebook called " Digression: A note on representation" that would be useful to look at now, to see how this would look in a VAR(1,0) case with 3 endogenous variables.

Now we just need R and Q. \varepsilon_t isn’t going to change. So the job of R is to make it conform in shape to the new state vector x_t. Nothing in x_t gets shocks except the first k entries (the ones associated with t-1, so R is going to have a lot of zeros. One choice is to define R = \begin{bmatrix} A^{-1} B \\ 0 \\ \vdots \\ 0 \end{bmatrix}, which implies that Q = I_k.

Alternatively, you could decide to define R = \begin{bmatrix} I_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}, in which case you’re now forced to pick Q = A^{-1} B B^T A^{-T} (that follows by analolgy from multiplying a scalar random variable a \sim N(0, 1) by a scalar b gives a \cdot b \sim N(0, b^2)

With that we’re basically done. Define Z = \begin{bmatrix} I_k & 0 & \dots & 0 \end{bmatrix} (it just picks out the top k elements of the x_t vector), and we have everything in statespace form.

From here, you have two choices:

Option 1 is to just use the pre-defined BayesianVARMA model directly. This will force you to go with the covariance definition of R and Q, because it internally sets R to the one with the I_k on top.
a. Going with the example you gave, you need 6 coefficients in the A matrix, Sample those, then arrange them into a lower triangular matrix.
b. Next, sample your VAR coefficient matrices. Then construct the \Gamma matrices, and finally ravel, concatenate, and reshape into the form expected by the model
c. Finally, do the same for the covariance, computing A^{-1} B B^T A^{-T}

This model would look something like this (priors are dumb on purpose):

from pymc_extras.statespace import BayesianVARMAX
import pymc as pm
import pytensor.tensor as pt
import numpy as np

mod = BayesianVARMAX(order=(2, 0), 
                     endog_names=['x', 'y', 'z'],
                     stationary_initialization=True)

k = mod.k_posdef
A_mask = np.array([[1, 0, 0],
                   [1, 1, 0],
                   [1, 1, 1]])
B_mask = np.eye(3)

n_nonzero_A = A_mask.ravel().sum()
n_nonzero_B = B_mask.ravel().sum()

with pm.Model(coords=mod.coords) as m:
    A_coefs = pm.Normal('A_coefs', 0, 1, shape=(n_nonzero_A, ))
    B_coefs = pm.Normal('B_coefs', 0, 1, shape=(n_nonzero_B, ))
    
    # .set is necessary in pytensor to allocate a value inside a matrix. Here, we use A_mask 
    # and B_mask to do fancy indexing 
    A = pm.Deterministic('A',
                         pt.zeros((k, k))[A_mask.astype(bool)].set(A_coefs), 
                         dims=['observed_state', 'observed_state_aux'])
    B = pm.Deterministic('B', 
                         pt.zeros((k, k))[B_mask.astype(bool)].set(B_coefs),
                         dims=['observed_state', 'observed_state_aux'])
    
    # According to the build report, ar_lags needs to be shape ['observed_state', 'ar_lags', 'observed_state_aux']. 
    # But we need compute A_inv @ A_i for each lag, so first we use ar_lag as a "batch dim",
    # then we'll fix it later.
    lag_coefs = pm.Normal('lag_matrices', 0, 1, dims=['ar_lag', 'observed_state', 'observed_state_aux'])

    # Pytensor will automatically vectorize solve across the left-most dimension. We have to tell it
    # that b has 2 dims, otherwise it will solve A against column vectors with 2 batch dims, which
    # isn't what we want.
    Gammas = pt.linalg.solve(A, lag_coefs, b_ndim=2, assume_a='gen')
    
    # Now that we have Gammas, we can use swapaxes to put the dims into the right order.      
    # Here, the name of the variable is extremely important -- this is what will actually go into the T matrix.
    ar_params = pm.Deterministic('ar_params', pt.swapaxes(Gammas, 0, 1), dims=['observed_state', 'ar_lag', 'observed_state_aux'])
    
    # Similarly, we want to build to covariance matrix, then give it the right name in a deterministic
    # First, get A^-1 @ B
    A_inv_B = pt.linalg.solve(A, B, assume_a='gen')

    # Then "square" it and put it in a deterministic with the right name.
    state_cov = pm.Deterministic('state_cov', A_inv_B @ A_inv_B.T, dims=['shock', 'shock_aux'])
    
    # Using all nan data will draw unconditional trajectories based on the prior (the kalman filter, having nothing to learn from, will just do nothing)
    mod.build_statespace_graph(np.full((100, 3), np.nan),
                               mode='JAX')
    
# This model is a mess -- the priors are really nasty and you'll have a hard time sampling numerically stable trajectories
# JAX has somewhat more stable linalg routines, so we want to use it to do forward sampling too.
with freeze_dims_and_data(m):
    prior = pm.sample_prior_predictive(compile_kwargs={'mode':'JAX'})

Option 2 is to make a custom statespace model to handle all of these reshapes and solves for you, and ideally open a PR to include it for others to use in the future! Explanation here, and some conversations here.

3 Likes

Amazing, thank you so much for this. Think I would have been quite stuck without your example. I will have a go with it. Yeah, it would be amazing to have this is all wrapped up nicely, at least into a custom state space model but ideally into a package that handled other typical SVAR type actions (eg computing and plotting impulse response functions). Really appreciate your help!

IRFs are handled by the BayesianVARMA class, see the example notebooks for details.