Changepoint and Changewindow Kernel for GP

I’m looking to (somewhat loosely) reproduce a lot of the work done by Duvenaud in his “tree-search” Additive kernel, and play around with it in more engineering-focused domains.

I love the new GP api, but there seems to be one last major kernel needed for this to be implemented, namely the changepoint kernel, demonstrated in section 2 here or in some detail here.

Any chance this might see implementation? @bwengals I actually found this via your blog-post, it was a great read!

I’m wondering how well the sampler would fare even if this was implemented…Discontinuities seem to be problematic (requiring, for example, special treatment via the “manifold kernel” or others for step function models). However, this type of function is necessary in the sensor anomaly domain in, e.g. manufacturing PHM, and I think the methodology in Duvenaud et al. would prove beneficial to a lot of operations researchers.

So, I mostly wanted to open a discussion about the discontinuity-modeling kernels (starting w/ changepoints) and get a list of “whishlist” kernels going:

I’m going to start looking into frankenstein-ing some of these together with the new GP module, but any discussion/direction is definitely welcome.

Sorry for the slow response. Yes, I definitely agree, changepoint kernel is really critical.

I think if I remember correctly, I implemented a simple changepoint kernel in 1D and sampling didn’t have too many problems. I didn’t get around to completing it and merging with the codebase, unfortunately. If the domain is continuous, then the changepoint kernels won’t have any issues.

Since there’s a wishlist, I thought this looked pretty cool too.

Absolutely. Those are rather impressive results, I’m going to have to keep an eye out for their work. I’ve added it to the “list”!

On another note, there’s something particularly magical about seeing the learned kernel as a form of interpretability for the designer. The makes me wonder if there are more sophisticated ways to review the learned kernel for higher dimensional regression problems. The key feature of these “informed” kernels is their ability to extrapolate, after which we of course need to verify the results. The learned kernel looks like a great way to make sure our intuition about the underlying structure is being captured, but this isn’t as straightforward for higher dim. problems. I’ll have to think about this…