I’ve been thinking about this and playing with it as time permits, and I wonder if there is a work-around to save the sampler a lot of hassle. The innovations aren’t just independent between timesteps – for all traditional statistical time series models that can be cast into a state-space format (SARIMAX, VARMAX), the distribution over innovations is assumed to be stationary. Any non-stationary dynamics (MA components, for example) can be modeled in the state or observation equations to render the noise stationary.
Perhaps I have a fundamental misunderstanding about what happens when a random variable is given a shape. When I implement a T x k innovation series, I am forcing the sampler to estimate a huge number of nuisance parameters, when in fact I only need to know the parameters that describe the transition and observation noise. It’s also not obvious to me what these epsilons are in the statespace framework.They aren’t quite residuals, because those are associated with the observation noise. They seem to me just an artifact of how I’m implementing the model.
What that in mind, I wonder if there is a better way to write it? Really the hidden states should be the random variables with mean defined autoregressively and constant variance.