Setting up a multiple input multiple output regularized Bayesian regression

I have a model of the form x_{t+1}=A x_{t} (a dynamical system) and want to regress the coefficients of A \in \mathbb{R}^{n\times n} onto x_{t+1} \in \mathbb{R}^n given m data pairs (x_{t+1},x_{t}). For this task, I have yet to find an approach to implement Bayesian regression. Help getting started with this would be greatly appreciated.

On top of that task, I would like to add a regularizing term which penalizes the Frobenius norm of A (equivalently an L2 penalization on both the rows and columns of A).

Thanks in advance :slight_smile:

Here’s a model that should implement what you’re looking for. I would normally suggest using one of the timeseries distributions since you are describing a vector autoregressive order-1 system, but there are some rough points in dealing with shapes there.

import numpy as np
import pymc3 as pm
import theano.tensor as tt

def frobenius_norm(X):
    return tt.sum(tt.nlinalg.trace(A@A.T))**0.5

# Create simulated data via forward evolution of system
K        = 3
T        = 20
error_sd_true = 0.5

A_true = np.random.randn(K,K)

x  = np.zeros([T, K])
x[0] = np.random.randn(K)

for t in range(1, T):
    jump = np.random.randn(K)
    x[t] = A_true@x[t-1] + jump*error_sd_true
    
penalty_weight = 0.1

with pm.Model():
        
    A_scale  = pm.HalfCauchy('A_scale',  beta=1)
    A            = pm.Normal('A', sd=A_scale, shape=[K,K])
    error_sd = pm.HalfCauchy('error_sd', beta=1)
    
    # We regress the shifted observations on the earlier observations
    # with the regression weights provided by A. Note that the right-multiplication
    # with A is only because of the shape we set up earlier for x.
    pm.Normal('x_end', mu=x[0:-1]@A, observed=x[1:], sd=error_sd)
    pm.Potential('A_penalty', frobenius_norm(A) * penalty_weight)
        
    trace = pm.sample()

Now, you may want to play around with the choice of prior for A. Right now, there are two sources of regularization - one from the pm.Normal prior placed on its entries (essentially an element-wise L2 loss) and the Frobenius norm calculated using the trace of A. I also tried out a version omitting the pm.Normal prior and instead replacing it with pm.Flat which applies no prior. It works, but it just takes longer to sample because the problem is more poorly conditioned.

I appreciate the help with this! When running your script I get the following TypeError:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in dtype_specs(self)
    264         try:
--> 265             return {
    266                 "float16": (float, "npy_float16", "NPY_FLOAT16"),

KeyError: 'object'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-14-b744de5862b4> in <module>
     28     # with A is only because of the shape we set up earlier for x.
     29     pm.Normal('x_end', mu=x[0:-1]@A, observed=x[1:], sd=error_sd)
---> 30     pm.Potential('A_penalty', frobenius_norm(A) * penalty_weight)
     31 
     32     trace = pm.sample()

<ipython-input-14-b744de5862b4> in frobenius_norm(X)
      1 def frobenius_norm(X):
----> 2     return tt.sum(tt.nlinalg.trace(A@A.T))**0.5
      3 
      4 # Create simulated data via forward evolution of system
      5 K        = 3

/usr/local/lib/python3.8/site-packages/theano/tensor/nlinalg.py in trace(X)
    232 
    233     """
--> 234     return extract_diag(X).sum()
    235 
    236 

/usr/local/lib/python3.8/site-packages/theano/graph/op.py in __call__(self, *inputs, **kwargs)
    248         """
    249         return_list = kwargs.pop("return_list", False)
--> 250         node = self.make_node(*inputs, **kwargs)
    251 
    252         if config.compute_test_value != "off":

/usr/local/lib/python3.8/site-packages/theano/tensor/basic.py in make_node(self, x)
   6563 
   6564     def make_node(self, x):
-> 6565         x = as_tensor_variable(x)
   6566 
   6567         if x.ndim < 2:

/usr/local/lib/python3.8/site-packages/theano/tensor/basic.py in as_tensor_variable(x, name, ndim)
    205         )
    206 
--> 207     return constant(x, name=name, ndim=ndim)
    208 
    209 

/usr/local/lib/python3.8/site-packages/theano/tensor/basic.py in constant(x, name, ndim, dtype)
    253         assert x_.ndim == ndim
    254 
--> 255     ttype = TensorType(dtype=x_.dtype, broadcastable=[s == 1 for s in x_.shape])
    256 
    257     try:

/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in __init__(self, dtype, broadcastable, name, sparse_grad)
     52         # True or False
     53         self.broadcastable = tuple(bool(b) for b in broadcastable)
---> 54         self.dtype_specs()  # error checking is done there
     55         self.name = name
     56         self.numpy_dtype = np.dtype(self.dtype)

/usr/local/lib/python3.8/site-packages/theano/tensor/type.py in dtype_specs(self)
    280             }[self.dtype]
    281         except KeyError:
--> 282             raise TypeError(
    283                 f"Unsupported dtype for {self.__class__.__name__}: {self.dtype}"
    284             )

TypeError: Unsupported dtype for TensorType: object

Would you be able to help with this error? It’s not something I have encountered prior.

Edit: I wanted to add that the error arises when calling frobenius_norm on the symbolic matrix A

Would you be able to share your PyMC and Theano versions with pm.version and theano.version? Mine was running on PyMC 3.11.4 and Theano 1.1.2. Also, I edited the example because for some random seeds, the data were not sufficiently informative to make the problem tractable.

I’m running Theano 1.1.2 and PyMC v4.0.0b2

I can try downgrading PyMC and see if that solves the problem.

Another question I had if I may ask:
I read through the paper on Probabilistic Matrix Factorization (Probabilistic Matrix Factorization) and noticed that simply by taking normal priors over the columns of the factor matrices (U and V), the Frobenius norms were directly penalized when considering the sum-of-squared errors objective function (eq. 4). Does this imply that I shouldn’t be explicitly penalizing the Frobenius norm of the matrix A if there is already a normal prior? Sorry for the unrelated question, but thought I’d ask and continue to pick your brain if you have the time.

I haven’t upgraded to PyMC v4, so that would likely be the source of the error.

With regard to the penalization, I read the part you mentioned in that paper where the authors refer to an IID normal prior being equivalent to a Frobenius norm on A. I didn’t know that ahead of time but it’s not surprising to me that it’s the case - it looks like you can omit the Frobenius penalty from the code I wrote.