Curve fitting accounting for data uncertainties in both x and y

Hi, I am attempting to determine the parameters in a nonlinear model to describe how a biological response depends on the concentration of a chemical. Through repeated measurements, I have an uncertainty in both the concentration and the response for each value.

Can anyone provide some advice on how to use PyMC to determine model parameters when the data to be fit have associated errors in both x and y?

Thanks for your help.

Kind regards, Gyro

The term of art is “measurement error model” for when the x are measured with error. The error in y is usually rolled into the standard regression error model along with modeling error.

The trick is to treat the true value as unknown and the basis for the measured value. So if your measurement for a covariate is x_obs, you’ll add a parameter for the true x and give x_obs a normal distribution with location x and a scale fit from the data (if the measurement error is symmetric and normal—if not, you can fit an arbitrary measurement error model given what you know about the measurement process). Then you do your regression in x. The error in y is usually rolled into the regression error. After the fit, you’ll want to check that your x and x_obs are close enough for what you know about the measurement error process. A point of failure in models like these is all the explanatory power landing on an unrealistic measurement error model.

3 Likes

Thank you! That was very helpful.