Error in Censored Data Models example notebook

drbenvincent · March 3, 2020, 1:49pm

The example notebook here https://docs.pymc.io/notebooks/censored_data.html is about censored data. But the first example figure shows truncated data, not censored data. The line

# Censor samples
censored = samples[(samples > low) & (samples < high)]

is actually truncation. So the histograms of the censored data is wrong and could be pretty confusing.

drbenvincent · March 4, 2020, 12:23pm

As in, it should be more like this…

# Produce normally distributed samples
np.random.seed(1618)
size = 500
μ = 13.0
σ = 5.0
samples = np.random.normal(μ, σ, size)

# Set censoring limits
high = 16.0
low = -1.0

# Truncate samples
def truncate(samples, low, high):
    return samples[(samples > low) & (samples < high)]


# Censor samples
def censor(samples, low, high):
    truncated_samples = truncate(samples, low, high)

    censored_samples = samples.copy()
    censored_samples[censored_samples < low] = low
    censored_samples[censored_samples > high] = high

    n_right_censored = len(samples[samples >= high])
    n_left_censored = len(samples[samples <= low])
    n_observed = len(samples) - n_right_censored - n_left_censored
    return (
        censored_samples,
        truncated_samples,
        n_left_censored,
        n_right_censored,
        n_observed,
    )

# Visualize uncensored and censored data
_, axarr = plt.subplots(nrows=3, figsize=[12, 8], sharex=True, sharey=True)
for i, data in enumerate([samples, truncated_samples, censored_samples]):
    sns.distplot(data, ax=axarr[i])
axarr[0].set_title("Uncensored")
axarr[1].set_title("Truncated")
axarr[2].set_title("Censored")
plt.show()

Resulting in this

junpenglao · March 4, 2020, 2:31pm

This is a good point. I think the best is to cross reference with the how Stan models these: https://mc-stan.org/docs/2_22/stan-users-guide/truncated-or-censored-data.html

Topic		Replies	Views
Help with Censored Regression Questions	23	3609	January 24, 2021
Truncated observed data? Questions	6	2306	November 6, 2020
Example notebook for truncated regression Sharing	6	1724	April 27, 2020
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	387	August 30, 2022
Issue with the accuracy of reconstructed data for censored data analysis	10	125	June 10, 2024

Error in Censored Data Models example notebook

Related topics