Modelling colored noise with varying correlation length and variance with a large dataset

I’m looking for suggestions on how to model colored noise with varying correlation length and (co)variance in pymc3. Ideally, if my dataset was smaller, I’d model it as a GP with a non-stationary covariance kernel with a quite short length and high variance scale. The goal would be to infer the variance and length scale. However, that is not computationally feasible and a sparse approximation with inducing points cannot adequately describe short lengths.

So I’m looking for alternative approaches. One would be to implement a sparse Cholesky factorization using scikit-sparse and then use it for the mostly sparse (below some numerical precision factor due to the short length scale) covariance matrix. But that seems like a lot of Theano work. In some ways this would be similar to the Sparse CAR model pymc3 example, but the CAR is somewhat less general in terms of the length scale.

Another approach would be to reformulate it as a non-stationary AR and/or MA model. But that formulation makes it harder to infer the (auto)covariance scales, although I suppose its is possible according to the equations in this article. pymc3 already has an AR model and a Gaussian walk (essentially an MA(1)) model.

I’m currently considering going for an MA model, because to some extent it would be similar to using a sparse Cholesky factor for a MvNormal, because The Cholesky factor L is likely also mostly sparse and u = L v essentially connects a chain of Normal v values.

Your question reminds me of this notebook:
I think using a Gaussian random walk model similar to this notebook is a good start. The key to me in that example is that you can express the total variance at each data point as a combination of noise and covariance from the data point right next to it.

Maybe a log-linear model on the variance?

\sigma^2_i = \exp \left( X_i \beta \right)

where X is some design matrix / basis and \beta are unknown coefficients? Could use a spline basis. This doesn’t help you with inferring a specific lengthscale. But you say the lengthscale changes over time though?

The wendland polynomial based covariance function is naturally sparse. There’s some about this in the Rasmussen + Williams book. The sparse cholesky factorization would take some theano work for sure… The gradient implementation of sparse cholesky would also need to be sparse.

The Gaussian random walk model is (according to my understanding) essentially an AR(1) (or very long MA) model. Yeas, I was thinking about soemthing like that, but I wasn’t sure how to relate the covariance between points with the innovation variance. Your formulation

got me thinking and I looked into it

Assuming something like an AR(1) model with a_1 = 1

x_i = x_{i-1} + u_i

Where u_i \sim \mathcal{N}(0, \sigma_i^2)

\mathrm{cov}(x_i, x_{i-1}) = \mathbf{E}\left( (x_i - \mathbf{E}x_i) (x_{i-1} - \mathbf{E}x_{i-1}) \right) =\mathbf{E}\left( (x_i - x_{i-1}) (x_{i-1} - x_{i-2}) \right)\\ = \mathbf{E}(u_i u_{i-1}) = \mathrm{cov}(u_i, u_{i-1})

So for u_i = u_{i-1} (homoscedastic) the covariance between points is the variance of the innovation.

Are you aware of some bibliography relating the covariance between points to the (co)variance of the innovations in a more general sense? I noticed that some text on computational statistics (e.g. Computational statistics, Gentle, 2099) ) mention that a MvNormal can be generated as essentially a MA sequence with coefficients given by something like the Cholesky factor of the MVNormal covariance, but they didn’t go into further detail.