Hierarchical GPs with varying lengthscales

In the code I posted, the second model uses pt.linalg.cholesky.

pt.linalg.cholesky performs a naive decomposition applicable to any matrix and is O(n^3) complexity.

The Bareiss algorithm is for decomposing Toeplitz matrices specifically and has O(n^2) complexity (and O(n^2) in memory). ^1

So, we should expect that the Bareiss algorithm will outperform traditional Cholesky for Toeplitz matrices, maybe with some overhead where it is only after the matrix reaches a particular size that the benefits of Bareiss appear, but with benefits scaling with increasing n_x thereafter.

In reality here, we have a very low-level-implemented (inc. explicit gradients) traditional cholesky and a very-high-level-implemented Bareiss, so when decomposing a single matrix I expected a slowdown with Bareiss. It’s possible that a high enough n_x would find a crossover point, but I didn’t think it was worthwhile exploring that.

I do think it would be worthwhile adding an explicit gradient for the Bareiss ^2 to see if that helps speed it up for the single matrix case, but I was actually more interested in how the Bareiss algorithm is structured such that vectorized computation of the decomposition of multiple matrices was easy to add. And when you set n_id>1 in the code I provided, we’re testing out that multiple-matrices scenario, where there is at least a window of values for n_x & n_id where the Bareiss algorithm is substantially faster than traditional cholesky.

Footnotes:
^1 There is a less memory-consumptive “Levinson-Durbin” algorithm (used in scipy.linalg.solve_toeplitz) that is O(n^2) in complexity and O(n) in memory, but LD is known to have numerical instability whereas Bareiss is provably stable. I actually started my explorations with the LD algorithm but found that even for low values of n_x I was seeing a concerning amount of accumulated error in the final terms of the decomposition, hence switching to Bareiss.

^2 I might end up first trying the “sub-quadratic” algorithm implemented in the SuperGauss package, which is O( n log n ) complexity and has the gradient already coded.

1 Like