# Observational error in logistic regression

Hi!

I am trying to correctly simulate the confidence bounds in a logistic regression were x is not perfectly known. Lest assume I have a pair of vectors of observations (cond, data), where cond is discrete and data is continuous and normally distributed with an observational error sigma. So,

``````with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=sigma)
p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))

with model:
observed = pm.Bernoulli("observed", p, observed=cond)
trace = pm.sampling_jax.sample_numpyro_nuts(tune=1000, draws=10000, chains=4, random_seed=307)
``````

The code samples, but the distribution of alpha and beta seems to be insensitive to the value of sigma (and so the overall logistic inference). I would expected broader distributions for (alpha, beta) as sigma increases. However, the distributions of data_noise do become wider as sigma increases.

Am I doing something wrong? And more generally: is this the standard way to introduce uncertainties in observations in the PyMC framework?

Your variable `data_obs` is not defined. Did you mean to use `data_noise`? Like this:

``````p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))
``````

Indeed! I will change the original post

I guess my question was whether that was the reason you weren’t seeing any connection between `sigma` and the posteriors of `alpha` and `beta`.

Sadly is not. I was just changing variable names to clarify the example and I missed that one.

Things seem to work ok here. When I set `sigma=0.1`, I get this:

``````In : az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions:  (hdi: 2)
Coordinates:
* hdi      (hdi) <U6 'lower' 'higher'
Data variables:
alpha    (hdi) float64 -3.263 0.9269
beta     (hdi) float64 0.01892 0.3808
``````

When I set `sigma=10`, I get this:

``````In : az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions:  (hdi: 2)
Coordinates:
* hdi      (hdi) <U6 'lower' 'higher'
Data variables:
alpha    (hdi) float64 -18.85 16.02
beta     (hdi) float64 0.215 18.57
``````

Here is my code:

``````import pymc as pm
import numpy as np
import scipy
import arviz as az
import matplotlib.pyplot as plt

rng = np.random.default_rng(12345)

n = 10
data_true = rng.integers(-10, high=10, size=n)
data = data_true + rng.normal(scale=10, size=n)
ps = scipy.special.expit(-10 + (3.14 * data_true))
cond = rng.binomial(1, ps)

with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=10)
observed = pm.Bernoulli("observed", pm.math.invlogit(beta*data_noise + alpha), observed=cond)
trace = pm.sample()

print(az.hdi(trace, var_names=["alpha", "beta"]))
``````
1 Like

Thank you for your feedback. I was testing your implementation against mine, since they are essentially the same. Except, that you used a very small number of observations, and I have >8000. In this context, it seems that error in the observations of almost any magnitude is of practically no consequence to the goodness of fit.

Thank you again!

1 Like