Maybe a log-linear model on the variance?
\sigma^2_i = \exp \left( X_i \beta \right)
where X is some design matrix / basis and \beta are unknown coefficients? Could use a spline basis. This doesn’t help you with inferring a specific lengthscale. But you say the lengthscale changes over time though?
The wendland polynomial based covariance function is naturally sparse. There’s some about this in the Rasmussen + Williams book. The sparse cholesky factorization would take some theano work for sure… The gradient implementation of sparse cholesky would also need to be sparse.