Hi!

I am trying to correctly simulate the confidence bounds in a logistic regression were x is not perfectly known. Lest assume I have a pair of vectors of observations (cond, data), where cond is discrete and data is continuous and normally distributed with an observational error sigma. So,

```
with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=sigma)
p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))
with model:
observed = pm.Bernoulli("observed", p, observed=cond)
trace = pm.sampling_jax.sample_numpyro_nuts(tune=1000, draws=10000, chains=4, random_seed=307)
```

The code samples, but the distribution of alpha and beta seems to be insensitive to the value of sigma (and so the overall logistic inference). I would expected broader distributions for (alpha, beta) as sigma increases. However, the distributions of data_noise do become wider as sigma increases.

Am I doing something wrong? And more generally: is this the standard way to introduce uncertainties in observations in the PyMC framework?

Your variable `data_obs`

is not defined. Did you mean to use `data_noise`

? Like this:

```
p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))
```

Indeed! I will change the original post

I guess my question was whether that was the reason you weren’t seeing any connection between `sigma`

and the posteriors of `alpha`

and `beta`

.

Sadly is not. I was just changing variable names to clarify the example and I missed that one.

Things seem to work ok here. When I set `sigma=0.1`

, I get this:

```
In [22]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions: (hdi: 2)
Coordinates:
* hdi (hdi) <U6 'lower' 'higher'
Data variables:
alpha (hdi) float64 -3.263 0.9269
beta (hdi) float64 0.01892 0.3808
```

When I set `sigma=10`

, I get this:

```
In [24]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions: (hdi: 2)
Coordinates:
* hdi (hdi) <U6 'lower' 'higher'
Data variables:
alpha (hdi) float64 -18.85 16.02
beta (hdi) float64 0.215 18.57
```

Here is my code:

```
import pymc as pm
import numpy as np
import scipy
import arviz as az
import matplotlib.pyplot as plt
rng = np.random.default_rng(12345)
n = 10
data_true = rng.integers(-10, high=10, size=n)
data = data_true + rng.normal(scale=10, size=n)
ps = scipy.special.expit(-10 + (3.14 * data_true))
cond = rng.binomial(1, ps)
with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=10)
observed = pm.Bernoulli("observed", pm.math.invlogit(beta*data_noise + alpha), observed=cond)
trace = pm.sample()
print(az.hdi(trace, var_names=["alpha", "beta"]))
```

1 Like

Thank you for your feedback. I was testing your implementation against mine, since they are essentially the same. **Except**, that you used a very small number of observations, and I have >8000. In this context, it seems that error in the observations of almost any magnitude is of practically no consequence to the goodness of fit.

Thank you again!

1 Like