Errors-in-variables model in PyMC3

I think its easiest to think of errors-in-variables as a repeated-measures model, and then take the number of replicates to be 1 so that:

x_{t,i} \sim N(x_t, \sigma^2_x)
y_{t,i} \sim N(\alpha + \beta x_t, \sigma^2_y)

Then x_t should be a vector and not, as in your case, a scalar, so add shape=x.shape[0] to true_x.

This explains why the intercept and gradient do not differ much from their prior. As true_x was a scalar, it basically estimates the mean and standard deviation of your data (x), so the effective number of samples you’re using is 1 instead of 10000.

With multiple replicates, you have a good amount of information to constrain the variance \sigma_x^2. When you take the number of replicates to 1, you lose the ability to decompose the residual variance into \sigma_x^2 and \sigma_y^2. This is a consequence of the fact that the model is underdetermined; so with this parametrization you should expect divergences to occur as a matter of course.

ODR is a re-parameterization under an assumed variance ratio (\sigma_y^2 = c \cdot \sigma_x^2; c=1 corresponds to ODR, c \neq 1 to weighted ODR). Thus, for classical ODR, the only change is to replace sigma_x and sigma_y with a single sigma_odr.

One final note: Because the variance of your x data is large compared to the range, there is a lot of wiggle room between the slope and the intercept, so even though the true value of the intercept is 0, the marginal mean winds up being close to 1.

Increasing the range of x to np.linspace(0, 10, size) results in a better ability to partition between the slope and intercept terms.

with pm.Model() as model:
    err_odr = pm.HalfNormal('err_odr', 5.)
    err_param = pm.HalfNormal('err_param', 5.)
    x_lat = pm.Normal('x_mu', 0, 5., shape=x.shape[0])
    x_obs = pm.Normal('x_obs', mu=x_lat, sd=err_odr, observed=x, shape=x.shape[0])
    offset = pm.Normal('offset', 0, err_param)
    slope = pm.Normal('slope', 0, err_param)
    y_pred = pm.Deterministic('y_pred', offset + slope * x_lat)
    y_obs = pm.Normal('y', mu=y_pred, sd=err_odr, observed=y)
    trace = pm.sample(800, tune=1500, chains=4, cores=3, init='jitter+adapt_diag')