Uncertainty in gaussian processes outside of input range

Consider the following image:

enter image description here

which is an fitted GP. Note how 0 <= x <= 2 yield a much higher uncertainty than e.g 5 <= x <= 8.

Thus gps are good when dealing with the exploration vs exploitation dilemma which is useful in e.g bayesian optimization.

Now consider e.g y = c* x which we fit with bayesian linear regression. No matter where the datapoints are lying, the uncertainty measurements will be the same. This makes sence, we have an parametric model which assumes an underlying datageneration process and should therefore also hold outside of our range of inputs.

In bayes opt they say that we can have any sort of surrogate function that yields uncertainty estimates but this one i just stated doesnt seem like a good one since it does not capture the uncertainty outside of our inputrange.

Now my first question is: If i would want to include some domain knowledge in my estimates such that the response should exhibit y = c*x^{c_2} where 0 <= c_2 <= 1 and c > 0 is there any way i could do that by tweaking the kernels, does anyone have any examples of that, i.e building a model that exhibits high uncertainty in unexplored regions but still exhibits domain characteristics as the just stated saturation(c_2)?

My second question is this, are there other methods out there that also captures the high uncertainty when extrapolating(extra credit for the one stating such an algorithm that can also incorporate domain knowledge such as the one stated above)?

Welcome!

You can check out the Rethinking notebook covering B-splines. Also, don’t just automatically assume that simple models will not appropriately convey uncertainty in ways you might want. For example, see this.

1 Like

For your first question, yes you can definitely do this! You can use a couple changepoint kernels. Where you have your parametric model defined, y = c x^{c_2}, use that as your mean function (c and c_2 can be unknown parameters that are given priors) and set your covariance function to return all zeros.

RE your second question, maybe check out PyMC-BART?