I did some more work on this, and I was able to get the skeleton of a full state state model in aesara. I broke it into modules and put them in a repo here with the local level example.
The core of the system is shamelessly copied from statsmodels.tsa.statespace into the AesaraRepresentation class. This is a wrapper class that holds the state space matrices, with __getitem__ and __setitem___ that allow for specific elements in the matrices to be set like this:
state_space['obs_cov', 0, 0] = theta[0]
Where theta is a flat parameter vector. Copying from the statsmodels setup, this AesaraRepresentation is held by a PyMC3StateSpace class, which also holds the data and a KalmanFilter object. The Kalman filter is essentially unchanged from what I posted earlier, but no longer returns a complied function.
The compilation is now the responsibly of the PyMC3StateSpace class, which implements two import class methods: update and complie_aesara_functions. Again copying from statsmodels, update is responsible for chopping up the parameter vector theta and assigning the elements to the correct places in the state space matrices. Here is the update method from the Local Level model:
def update(self, theta: at.TensorVariable) -> None:
"""
Put parameter values from vector theta into the correct positions in the state space matrices.
TODO: Can this be done using variable names to avoid the need to ravel and concatenate all RVs in the
PyMC model?
Parameters
----------
theta: TensorVariable
Vector of all variables in the state space model
"""
# initial states
self.ssm['initial_state', :, 0] = theta[:2]
# initial covariance
self.ssm['initial_state_cov', :, :] = theta[2:6].reshape((2, 2))
# Observation covariance
self.ssm['obs_cov', 0, 0] = theta[6]
# State covariance
self.ssm[self._state_cov_idx] = theta[7:]
The compile_aesara_functions method creates 4 functions that will be used in model evaluation: self.f_loglike, self.f_grad_loglike, self.f_y_hat, and self.f_cov_hat. These functions are called by helper at.Ops later. Again, here is the method for the Local Level model:
def compile_aesara_functions(self) -> None:
theta = at.vector('theta')
self.update(theta)
states, covariances, log_likelihood = self.kalman_filter.build_graph(self.data, *self.unpack_statespace())
self.f_loglike = aesara.function([theta], log_likelihood)
self.f_loglike_grad = aesara.function([theta], at.grad(log_likelihood, theta))
self.f_y_hat = aesara.function([theta], states)
self.f_cov_hat = aesara.function([theta], covariances)
Finally, these four functions get wrapped into at.Op classes that look like this:
class StateSpaceLikelihoodGrad(at.Op):
itypes = [at.dvector]
otypes = [at.dvector]
def __init__(self, model):
self.out_func = model.f_loglike_grad
def perform(self, node, inputs, outputs):
(theta,) = inputs
outputs[0][0] = self.out_func(theta)
class StateSpaceLikelihood(at.Op):
itypes = [at.dvector]
otypes = [at.dscalar]
def __init__(self, model):
self.out_func = model.f_loglike
self.grad_loglike = StateSpaceLikelihoodGrad(model)
def perform(self, node, inputs, outputs):
(theta,) = inputs
outputs[0][0] = self.out_func(theta)
def grad(self, inputs, output_grads):
(theta,) = inputs
return [output_grads[0] * self.grad_loglike(theta)]
I uploaded a sample notebook to the repo, I invite anyone who is interested to check it out. For anyone too lazy to click over, here is what actually fitting the model looks like:
model = BayesianLocalLevel(data=nile.values)
kalman_likelihood = StateSpaceLikelihood(model)
kalman_mean = StateSpaceMean(model)
kalman_cov = StateSpaceCovariance(model)
coords = {'time':nile.index.year.values,
'states':['level', 'trend'],
'obs':['nile']}
with pm.Model(coords=coords) as nile_model:
state_sigmas = pm.HalfNormal('state_sigma', sigma=1.0, dims=['states'])
obs_sigma = pm.HalfNormal('obs_sigma', sigma=1.0, dims=['obs'])
x0 = pm.Normal('initial_states', mu=0.0, sigma=1.0, dims=['states'])
sd_dist = pm.Exponential.dist(1.0)
chol, _, _ = pm.LKJCholeskyCov('initial_state_cov', n=2, eta=1.0,
sd_dist=sd_dist)
P0 = pm.Deterministic('P0', at.dot(chol, chol.T).ravel())
theta = at.concatenate([x0.ravel(), P0.ravel(), obs_sigma.ravel(), state_sigmas.ravel()])
likelihood = pm.Potential("likelihood", kalman_likelihood(theta))
y_hat = pm.Deterministic('y_hat', kalman_mean(theta))
cov_hat = pm.Deterministic('cov_hat', kalman_cov(theta))
prior_params = pm.sample_prior_predictive(var_names=['y_hat', 'cov_hat'])
This fit fine and without any divergences (I forgot to put traces in the notebook, something for v2), but it was still quite slow: it took about 15 minutes to sample. I imagine this is in no small part because of how I put the system together: I would be curious if anyone can spot obvious problems. I have no idea if the gradient computations are correct, for example: I just copied an example notebook. Some of the slowness might also be due to the LKJ covariance matrix foolishly insist on using. There are closed form solutions for marginalizing over the initial state and covariance values; I will add this option in soon.
Actual estimates end up being very close to the ML solution, so I am cautiously optimistic I didn’t mess anything up. Here are the posteriors, with ugly red lines denoting the ML point estimates. As you can see, everything is pretty bang on.
Finally, the actual estimated state values. I’m not sure how to combine the uncertainty that comes from the parameter estimates with the variances from the model, so I didn’t do it in this plot. The shaded region is just the 90% CI around the mean estimate, not the true uncertainty around the model estimates (which need to account for the covariance estimation).
So where to go from here? Given that I already copied Statsmodels so blatantly, my next step will be to copy their SARIMAX api. The goal is to be able to specify the model using order=(p,d,q), seasonal_order=(P,D,Q,s), trend=['c', 't', 'ct'], then let the user set up the model as above. It would also be nice to have some helper functions to gather up, flatten, and concatenate the parameters in the PyMC model for the user. Ideally, we would end up with something like:
model = BayesianSARIMAX(exog=data, endog=feature_df, order=(1,0,0), trend='ct')
kalman_likelihood = StateSpaceLikelihood(model)
kalman_mean = StateSpaceMean(model)
kalman_cov = StateSpaceCovariance(model)
with pm.Model(coords = coords) as time_series_model:
# assign priors
theta = helper_function(parameter_list)
likelihood = pm.Potential("likelihood", kalman_likelihood(theta))
This would leave things open for hierarchical modeling.
In principal, VARMAX models are also well within reach, as well as exponential moving average models, even DSGE models – anything that can be represented in state space form can be easily dropped into this framework by just extending the PyMCStateSpace class. I just need to go over the kalman_step method in the KalmanFilter to make sure the shapes are all correct for the case when there are 3d and 4d tensors being passed in. I’m pretty sure that’s not currently the case. I also need to update the kalman filter equations to include the trend vectors in the transition and observation equations.
Other potential steps:
- Impulse Response Functions
- Simulation method. Would allow for non-normally distributed errors via
pm.Simulate? - More Filter classes to extended to other error types: Bayes filter, unscented kalman filter, etc.
This is of course in addition to figuring out how to improvement and get more speed out of this basic setup. Anyway I’m quite excited by it, hopefully it is interesting to someone else out there too!

