I’m just getting started with trying to use Gaussian processes. I have watched Statistical Rethinking 2023 - 16 - Gaussian Processes a few times and listened to some episodes on the Learning Bayesian Statistics where Gaussian processes were discussed. I also read through Gaussian Processes — PyMC 5.15.1 documentation. Now I want to start using them in my own projects, but I would like to use a kernel which doesn’t appear to be implemented yet.
In my case I have observations in terms of latitude and longitude, which I intend on converting to radians. I would like to use a base kernel equal to the haversine distance:
Hi @Galen_Seilis, that’s a great question with an uncomfortable answer. I had the same question some years ago, and I ended up implementing a Matern kernel that used the Haversine distance (actually the grand-circle distance), and I had to dig deep to find out why my kernel gave back such huge CRAP.
I think that you had the same intuition that I had, use the Haversine distance instead of the Euclidean distance with a standard kernel like one from the Matern family. The short answer is that you cannot simply put any distance function inside of a Matern kernel and hope that the result works as a valid kernel (insert Boromir meme). For a short discussion, you can read section 2.1 here. Defining kernels on manifolds, like a sphere is perfectly possible, but you have to be careful that you end up with a positive semidefinite function at the end (the only real property required by a GP covariance kernel). Putting the Haversine inside Matern, is not positive semidefinite, and makes your GP wrong (it’s like trying to model Gaussian noise using a negative variance).
There are a couple of easy ways around this problem:
Use the Chordal distance instead of the grand-circle distance. This works because Chordal distance is simply Euclidean distance of points that are constrained to be on the surface of a sphere.
Work with projections onto flat 2D planes. An example of this is the UTM system.
The harder path, is to try to actually find a way to get kernels that work with the grand-circle distance. You can have a look at the reference I shared earlier, or you could also try out the things these people say, keeping in mind that they actually want to be able to use non-stationary kernels on the sphere.
Good question. I don’t have any reason to say that it is optimal to parametrize by lat/long. The coordinates in the data are given as lat/long, but a change of coordinates to improve the math or MCMC is an option. Diffeomorphisms on a sphere are preferable among maps that one may use to change the coordinates.