Error in Censored Data Models example notebook

The example notebook here https://docs.pymc.io/notebooks/censored_data.html is about censored data. But the first example figure shows truncated data, not censored data. The line

# Censor samples
censored = samples[(samples > low) & (samples < high)]

is actually truncation. So the histograms of the censored data is wrong and could be pretty confusing.

As in, it should be more like this…

# Produce normally distributed samples
np.random.seed(1618)
size = 500
μ = 13.0
σ = 5.0
samples = np.random.normal(μ, σ, size)

# Set censoring limits
high = 16.0
low = -1.0

# Truncate samples
def truncate(samples, low, high):
    return samples[(samples > low) & (samples < high)]


# Censor samples
def censor(samples, low, high):
    truncated_samples = truncate(samples, low, high)

    censored_samples = samples.copy()
    censored_samples[censored_samples < low] = low
    censored_samples[censored_samples > high] = high

    n_right_censored = len(samples[samples >= high])
    n_left_censored = len(samples[samples <= low])
    n_observed = len(samples) - n_right_censored - n_left_censored
    return (
        censored_samples,
        truncated_samples,
        n_left_censored,
        n_right_censored,
        n_observed,
    )

# Visualize uncensored and censored data
_, axarr = plt.subplots(nrows=3, figsize=[12, 8], sharex=True, sharey=True)
for i, data in enumerate([samples, truncated_samples, censored_samples]):
    sns.distplot(data, ax=axarr[i])
axarr[0].set_title("Uncensored")
axarr[1].set_title("Truncated")
axarr[2].set_title("Censored")
plt.show()

Resulting in this

1 Like

This is a good point. I think the best is to cross reference with the how Stan models these: https://mc-stan.org/docs/2_22/stan-users-guide/truncated-or-censored-data.html