The example notebook here https://docs.pymc.io/notebooks/censored_data.html is about censored data. But the first example figure shows truncated data, not censored data. The line
# Censor samples
censored = samples[(samples > low) & (samples < high)]
is actually truncation. So the histograms of the censored data is wrong and could be pretty confusing.
As in, it should be more like this…
# Produce normally distributed samples
np.random.seed(1618)
size = 500
μ = 13.0
σ = 5.0
samples = np.random.normal(μ, σ, size)
# Set censoring limits
high = 16.0
low = -1.0
# Truncate samples
def truncate(samples, low, high):
return samples[(samples > low) & (samples < high)]
# Censor samples
def censor(samples, low, high):
truncated_samples = truncate(samples, low, high)
censored_samples = samples.copy()
censored_samples[censored_samples < low] = low
censored_samples[censored_samples > high] = high
n_right_censored = len(samples[samples >= high])
n_left_censored = len(samples[samples <= low])
n_observed = len(samples) - n_right_censored - n_left_censored
return (
censored_samples,
truncated_samples,
n_left_censored,
n_right_censored,
n_observed,
)
# Visualize uncensored and censored data
_, axarr = plt.subplots(nrows=3, figsize=[12, 8], sharex=True, sharey=True)
for i, data in enumerate([samples, truncated_samples, censored_samples]):
sns.distplot(data, ax=axarr[i])
axarr[0].set_title("Uncensored")
axarr[1].set_title("Truncated")
axarr[2].set_title("Censored")
plt.show()
Resulting in this
1 Like
This is a good point. I think the best is to cross reference with the how Stan models these: https://mc-stan.org/docs/2_22/stan-users-guide/truncated-or-censored-data.html