I’m currently trying to implement an **errors-in-variables** model (a regression model that accounts for errors in the independent variables) in PyMC3.

In a recently published paper (https://arxiv.org/pdf/1906.03989.pdf) I’ve read that

"

Except for the simplest case of linear regression [29], EIV modeling almost always

requires auxiliary information or data to correct the mismeasurement bias in estimation.

[…] However, without additional data, Bayesian EIV

modeling is currently the most powerful and flexible approach, as it allows incorporating

additional information in the form of distributional assumptions [15].

So let’s say we have a simple linear errors-in-variables model:

where **α** and **β** are the parameters of interest, **ε _{t}** and

**η**are the errors and

_{t}**x**denotes the true but unobserved regressor.

_{t}^{*}First I create some dummy data and add some error:

```
import numpy as np
intercept = 0
gradient = 2.5
size = 10000
x_true = np.array(np.linspace(0,1,size))
y_true = intercept + gradient * x_true
# adding noise to data
x = x_true + np.random.normal(loc=0, scale=0.5, size=size)
y = y_true + np.random.normal(loc=0, scale=0.5, size=size)
```

```
```And then I created my model (reproduced it from https://github.com/pymc-devs/pymc3/issues/2226 and made some adjustments), where I used high sd’s to get noninformative priors and modeled the “true” regressor **x**^{*} as a random variable, independend from the measurement error:

```
import pymc3 as pm
with pm.Model() as eiv_model:
# Define priors
intercept = pm.Normal('intercept', mu=0, sd=20)
gradient = pm.Normal('gradient', mu=0, sd=20)
sigma_x = pm.HalfNormal('sigma_x', sd=20)
sigma_y = pm.HalfNormal('sigma_y', sd=20)
true_x = pm.Normal('true_x', mu=0, sd=10)
likelihood_x = pm.Normal('x', mu=true_x, sd=sigma_x, observed=x)
true_y = pm.Deterministic('true_y', intercept + gradient * true_x)
likelihood_y = pm.Normal('y', mu=true_y, sd=sigma_y, observed=y)
```

```
```As a result I get:

And now I have some questions:
- Is this model a representative Bayesian errors-in-variables model or is there some bias I’m just not seeing?
- The inferences of the errors, true_x and true_y seems reasonable, but the uncertainty of the intercept and gradient is quite big. Is there a valid way to reduce that uncertainty if all that’s given are the noisy measurements of x and y?
- In the model from https://github.com/pymc-devs/pymc3/issues/2226, the shape paramter is used for modelling the true regressor x, but this is only necessary for multivariate models, right?
- Is it possible to modify the model so, that the result corresponds to an
**orthogonal distance regression** model? How would the priors be formulated then?

I’m thankful for any advice, since I’m new to Bayesian inference and probabilistic programming.