Hi!
I am trying to correctly simulate the confidence bounds in a logistic regression were x is not perfectly known. Lest assume I have a pair of vectors of observations (cond, data), where cond is discrete and data is continuous and normally distributed with an observational error sigma. So,
with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=sigma)
p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))
with model:
observed = pm.Bernoulli("observed", p, observed=cond)
trace = pm.sampling_jax.sample_numpyro_nuts(tune=1000, draws=10000, chains=4, random_seed=307)
The code samples, but the distribution of alpha and beta seems to be insensitive to the value of sigma (and so the overall logistic inference). I would expected broader distributions for (alpha, beta) as sigma increases. However, the distributions of data_noise do become wider as sigma increases.
Am I doing something wrong? And more generally: is this the standard way to introduce uncertainties in observations in the PyMC framework?
Your variable data_obs
is not defined. Did you mean to use data_noise
? Like this:
p = pm.Deterministic("p", 1.0/(1. + np.exp(beta*data_noise + alpha)))
Indeed! I will change the original post
I guess my question was whether that was the reason you weren’t seeing any connection between sigma
and the posteriors of alpha
and beta
.
Sadly is not. I was just changing variable names to clarify the example and I missed that one.
Things seem to work ok here. When I set sigma=0.1
, I get this:
In [22]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions: (hdi: 2)
Coordinates:
* hdi (hdi) <U6 'lower' 'higher'
Data variables:
alpha (hdi) float64 -3.263 0.9269
beta (hdi) float64 0.01892 0.3808
When I set sigma=10
, I get this:
In [24]: az.hdi(trace, var_names=["alpha", "beta"])
<xarray.Dataset>
Dimensions: (hdi: 2)
Coordinates:
* hdi (hdi) <U6 'lower' 'higher'
Data variables:
alpha (hdi) float64 -18.85 16.02
beta (hdi) float64 0.215 18.57
Here is my code:
import pymc as pm
import numpy as np
import scipy
import arviz as az
import matplotlib.pyplot as plt
rng = np.random.default_rng(12345)
n = 10
data_true = rng.integers(-10, high=10, size=n)
data = data_true + rng.normal(scale=10, size=n)
ps = scipy.special.expit(-10 + (3.14 * data_true))
cond = rng.binomial(1, ps)
with pm.Model() as model:
beta = pm.Normal("beta", mu=0, tau=0.01)
alpha = pm.Normal("alpha", mu=0, tau=0.01)
data_noise = pm.Normal("data_noise", mu=data, sigma=10)
observed = pm.Bernoulli("observed", pm.math.invlogit(beta*data_noise + alpha), observed=cond)
trace = pm.sample()
print(az.hdi(trace, var_names=["alpha", "beta"]))
1 Like
Thank you for your feedback. I was testing your implementation against mine, since they are essentially the same. Except, that you used a very small number of observations, and I have >8000. In this context, it seems that error in the observations of almost any magnitude is of practically no consequence to the goodness of fit.
Thank you again!
1 Like