Observational error in logistic regression

verderis · March 2, 2023, 6:54pm

Hi!

I am trying to correctly simulate the confidence bounds in a logistic regression were x is not perfectly known. Lest assume I have a pair of vectors of observations (cond, data), where cond is discrete and data is continuous and normally distributed with an observational error sigma. So,

with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, tau=0.01)
    alpha = pm.Normal("alpha", mu=0, tau=0.01)
    data_noise = pm.Normal("data_noise", mu=data, sigma=sigma)
    p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))

with model:
    observed = pm.Bernoulli("observed", p, observed=cond)
    trace = pm.sampling_jax.sample_numpyro_nuts(tune=1000, draws=10000, chains=4, random_seed=307)

The code samples, but the distribution of alpha and beta seems to be insensitive to the value of sigma (and so the overall logistic inference). I would expected broader distributions for (alpha, beta) as sigma increases. However, the distributions of data_noise do become wider as sigma increases.

Am I doing something wrong? And more generally: is this the standard way to introduce uncertainties in observations in the PyMC framework?

cluhmann · March 2, 2023, 7:16pm

Your variable data_obs is not defined. Did you mean to use data_noise? Like this:

p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))

verderis · March 2, 2023, 7:17pm

Indeed! I will change the original post

cluhmann · March 2, 2023, 7:56pm

I guess my question was whether that was the reason you weren’t seeing any connection between sigma and the posteriors of alpha and beta.

verderis · March 2, 2023, 8:00pm

Sadly is not. I was just changing variable names to clarify the example and I missed that one.

cluhmann · March 2, 2023, 8:17pm

Things seem to work ok here. When I set sigma=0.1, I get this:

In [22]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions:  (hdi: 2)
Coordinates:
  * hdi      (hdi) <U6 'lower' 'higher'
Data variables:
    alpha    (hdi) float64 -3.263 0.9269
    beta     (hdi) float64 0.01892 0.3808

When I set sigma=10, I get this:

In [24]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions:  (hdi: 2)
Coordinates:
  * hdi      (hdi) <U6 'lower' 'higher'
Data variables:
    alpha    (hdi) float64 -18.85 16.02
    beta     (hdi) float64 0.215 18.57

Here is my code:

import pymc as pm
import numpy as np
import scipy
import arviz as az
import matplotlib.pyplot as plt

rng = np.random.default_rng(12345)

n = 10
data_true = rng.integers(-10, high=10, size=n)
data = data_true + rng.normal(scale=10, size=n)
ps = scipy.special.expit(-10 + (3.14 * data_true))
cond = rng.binomial(1, ps)

with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, tau=0.01)
    alpha = pm.Normal("alpha", mu=0, tau=0.01)
    data_noise = pm.Normal("data_noise", mu=data, sigma=10)
    observed = pm.Bernoulli("observed", pm.math.invlogit(beta*data_noise + alpha), observed=cond)
    trace = pm.sample()

print(az.hdi(trace, var_names=["alpha", "beta"]))

verderis · March 2, 2023, 9:14pm

Thank you for your feedback. I was testing your implementation against mine, since they are essentially the same. Except, that you used a very small number of observations, and I have >8000. In this context, it seems that error in the observations of almost any magnitude is of practically no consequence to the goodness of fit.

Thank you again!

Topic		Replies	Views
Lack of convergence in a super simple toy model Questions	3	486	July 17, 2021
Building a normal likelihood with known but varying variance for every data point/sample Questions	1	399	February 20, 2021
Seeds: Random effect logistic regression Questions	31	1274	March 29, 2022
Measurement Uncertainties in Multiple Regression Questions	3	1412	April 28, 2020
Observed data with uncertainty described by an irregular distribution v5	26	591	March 1, 2024

Observational error in logistic regression

Related topics