Uncertainty in gaussian processes outside of input range

rnaomizer · January 16, 2023, 8:23am

Consider the following image:

which is an fitted GP. Note how 0 <= x <= 2 yield a much higher uncertainty than e.g 5 <= x <= 8.

Thus gps are good when dealing with the exploration vs exploitation dilemma which is useful in e.g bayesian optimization.

Now consider e.g y = c* x which we fit with bayesian linear regression. No matter where the datapoints are lying, the uncertainty measurements will be the same. This makes sence, we have an parametric model which assumes an underlying datageneration process and should therefore also hold outside of our range of inputs.

In bayes opt they say that we can have any sort of surrogate function that yields uncertainty estimates but this one i just stated doesnt seem like a good one since it does not capture the uncertainty outside of our inputrange.

Now my first question is: If i would want to include some domain knowledge in my estimates such that the response should exhibit y = c*x^{c_2} where 0 <= c_2 <= 1 and c > 0 is there any way i could do that by tweaking the kernels, does anyone have any examples of that, i.e building a model that exhibits high uncertainty in unexplored regions but still exhibits domain characteristics as the just stated saturation(c_2)?

My second question is this, are there other methods out there that also captures the high uncertainty when extrapolating(extra credit for the one stating such an algorithm that can also incorporate domain knowledge such as the one stated above)?

cluhmann · January 16, 2023, 5:43pm

Welcome!

You can check out the Rethinking notebook covering B-splines. Also, don’t just automatically assume that simple models will not appropriately convey uncertainty in ways you might want. For example, see this.

bwengals · January 17, 2023, 10:38pm

For your first question, yes you can definitely do this! You can use a couple changepoint kernels. Where you have your parametric model defined, y = c x^{c_2}, use that as your mean function (c and c_2 can be unknown parameters that are given priors) and set your covariance function to return all zeros.

RE your second question, maybe check out PyMC-BART?

Topic		Replies	Views
Limiting domain of sampling from gaussian process Questions	3	947	September 16, 2018
Adaptive surrogate models Questions	4	1653	October 23, 2017
Custom prior for gaussian process mean v5 modeling	1	202	October 25, 2023
Marginal likelihood implementation and measurement uncertainty Questions	12	841	November 2, 2019
Reasoning about modelling uncertainty version agnostic modeling	0	308	September 28, 2022

Uncertainty in gaussian processes outside of input range

Related topics