Haversine Kernel?

I’m just getting started with trying to use Gaussian processes. I have watched Statistical Rethinking 2023 - 16 - Gaussian Processes a few times and listened to some episodes on the Learning Bayesian Statistics where Gaussian processes were discussed. I also read through Gaussian Processes — PyMC 5.15.1 documentation. Now I want to start using them in my own projects, but I would like to use a kernel which doesn’t appear to be implemented yet.

In my case I have observations in terms of latitude and longitude, which I intend on converting to radians. I would like to use a base kernel equal to the haversine distance:

(\phi_1, \phi_2, \lambda_1, \lambda_2) \mapsto 2 r \arcsin \left( \sqrt{\frac{1-\cos(\phi_2-\phi_1) + \cos \phi_1 \cos \phi_2 (1 - \cos (\lambda_2 - \lambda_2) )}{2}} \right)

Is there a straightforward way to implement a kernel like this?

CC @lucianopaz

Hi @Galen_Seilis, that’s a great question with an uncomfortable answer. I had the same question some years ago, and I ended up implementing a Matern kernel that used the Haversine distance (actually the grand-circle distance), and I had to dig deep to find out why my kernel gave back such huge CRAP.

I think that you had the same intuition that I had, use the Haversine distance instead of the Euclidean distance with a standard kernel like one from the Matern family. The short answer is that you cannot simply put any distance function inside of a Matern kernel and hope that the result works as a valid kernel (insert Boromir meme). For a short discussion, you can read section 2.1 here. Defining kernels on manifolds, like a sphere is perfectly possible, but you have to be careful that you end up with a positive semidefinite function at the end (the only real property required by a GP covariance kernel). Putting the Haversine inside Matern, is not positive semidefinite, and makes your GP wrong (it’s like trying to model Gaussian noise using a negative variance).

There are a couple of easy ways around this problem:

  1. Use the Chordal distance instead of the grand-circle distance. This works because Chordal distance is simply Euclidean distance of points that are constrained to be on the surface of a sphere.
  2. Work with projections onto flat 2D planes. An example of this is the UTM system.

The harder path, is to try to actually find a way to get kernels that work with the grand-circle distance. You can have a look at the reference I shared earlier, or you could also try out the things these people say, keeping in mind that they actually want to be able to use non-stationary kernels on the sphere.

7 Likes

Not sure if this is appropriate because the author himself did not link to it: Modeling spatial data with Gaussian processes in PyMC - PyMC Labs but my semantic cluster algorithm thought it could be

2 Likes

I’d like to implement this in PyMC as a HSGP. Unfortunately I can’t promise any timeline though. @bwengals

1 Like

@Galen_Seilis, in your particular case, is latitude/longitude actually the most convenient parameterization?

1 Like

Good question. I don’t have any reason to say that it is optimal to parametrize by lat/long. The coordinates in the data are given as lat/long, but a change of coordinates to improve the math or MCMC is an option. Diffeomorphisms on a sphere are preferable among maps that one may use to change the coordinates.