Extrapolation vs Interpretation

mattiasthalen · September 22, 2020, 6:27am

I was wondering, is there any best practice for extrapolating from a regression line vs interpolation?

What I’ve basically done is to scale up sigma the further from the last observed value on the line I get. My reasoning is that interpolated values would generally be more precise, but extrapolating values is less precise.

cluhmann · September 23, 2020, 12:38am

What is the form of your regression? In many Bayesian contexts, both intercepts and slopes (coefficients) will be uncertain. In such models, extrapolation will automatically be uncertain without having to build it in explicitly. McElreath refers to this as the “bow tie” pattern.

mattiasthalen · September 23, 2020, 6:07am

It’s a ordinary least square: y = a + bx

Oh. never thought about that, but it makes perfect sense.
I was just skeptical seeing the extrapolated predictions.

ar-valdez · September 28, 2020, 4:53pm

So I think it’s good chance to mention some important differences in concepts,

Interpolation has null error between the interpolator and the data at the data point,
A regression minimizes the “global” error between the regressive and the data points.

This being said, I think that everything depends on your applications. Hope this helps.

NateAM · September 29, 2020, 2:01am

Another thing to consider is when you extrapolate you may or may not have confidence that your model ( e.g. the model form ) still applies. Consider @cluhmann 's example but the data goes nonlinear outside of the measurements. You would be underestimating the uncertainty severely and would be delivering an answer with higher confidence than you should even with the bowtie effect. Only you can know for your problem of interest if your model will be appropriate for extrapolation and how far you would trust it outside of the dataset.

mattiasthalen · September 29, 2020, 7:35am

Interesting. Am I right in thinking that what you’re eluding to is that while my data suggests a linear relationship, if I were to collect the data that would correspond to the extrapolated data it might actually something else instead? E.g. the data now might look more like an exponential curve, where as my initial data suggested it was linear.

NateAM · September 29, 2020, 7:01pm

That’s exactly what I mean. This is something that a GP handles ( in some regards ) via the covariance function but you still select which covariance function you use ( i.e. Gaussian vs. Exponential ) which has implications for how rapidly, and to what extent, your uncertainty grows.

In my field ( material modeling ), the models that I construct are informed by physical laws which must be observed so, at the very least, that helps me build models which are forced to remain within the laws of physics. But if I extrapolate too far or even interpolate between very widely spaced points there is no guarantee that I’ll be predictive. If you are trying to make inferences between things that don’t observe physical laws ( or at least you aren’t enforcing them in your modeling framework ) but are trying to intuit trends or assume some form that seems appropriate given the data you observed you have to be very careful.

As a more concrete ( though kind of silly ) example:
I build a model for steel and I calibrate that model using data that I collect at room temperature. I’m probably okay being predictive in that regime but I definitely won’t be predictive on the surface of the sun!

Another silly example:
I have a perfect model that captures the behavior of steam and ice. There is no guarantee that I will be accurate in the prediction of water.

You might be able to hedge your bets some by doing something like Bayesian Model Averaging or Bayesian Model Combination both of which are subsets of Ensemble Learning. I would at least consider something like it if you are looking to extrapolate far from data or interpolate between very widely spaced clusters of points.

Topic		Replies	Views
Recommended approach for OOS prediction Questions	1	709	June 19, 2017
Improve accuracy of probabilistic linear regression Questions	1	518	January 22, 2019
Posterior predictive sampling with data variance Questions	10	2404	September 14, 2018
Uncertainty of Model Predictions Questions	4	2013	April 10, 2018
Tutorial real work about simple and multiple linear regression bayesian Questions	13	2174	September 20, 2018

Extrapolation vs Interpretation

Related topics