Hierarchical GPs with varying lengthscales

Very cool, I didn’t know about these methods – thank you for the explanation!

It wouldn’t take much work to wrap scipy.linalg.solve_toeplitz into an Op and use it for both the forward and backward pass (the sensitivity equations of solve involve another solve against A, so you can take advantage of its structure both times). For that matter, I don’t think it would be too onerous to take the code you have for Bareiss and make a COp, so that you avoid the scan and stay in C-land for the entire computation. Then you can scan (or vectorize) over your huge stack of matrices. A numba implementation would be even easier – you could do a basic Op with python loop for the Bareiss algorithm, then add a numba dispatch to an njit version that pytensor could use when sampling your model.

One other thing to think about is that in pytensor we do optimizations when you have solve(A, b) and solve(A,c) by rewriting this into lu_and_piv = lu_factor(A) --> lu_solve(lu_and_piv, b), lu_solve(lu_and_piv, c). In the positive definite case, this turns into cho_solve(L, b), cho_solve(L, c). Reusing the decomposition always comes up when you are computing values and gradients, so I wonder if it’s something that could also be exploited in this case. If I understand you code well, you’re doing a special case cholesky decomposition? You could probably just hand that off to cho_solve and skip the forward step. My uneducated guess is that this would end up with better gradients.