Imputing Censored Data with Unqiue Censor Points

Hello all,

I am attempting to re-create JAG models from an existing R project into PyMC. However, I am hitting a snag with censored data imputation. (Paper here if bored)

I need to be able to impute primarily left, and occasionally interval and right censored data within a small dataset that may have different censoring points per measurement. For example, results may come from different laboratories with different Limits of Quantification (LOQ).

The model is relatively simple. For example

# Example of data with left, right and interval censoring

data = ['28.9', '<19.4', '<5.5', '>149.9', '26.42', '56.1', '[25.6-50.2]']

I can follow the example for censored data here, however only for the non-imputated section. I found that you can pass an array for lower/upper for pm.Censored, but I can figure out how to get it working for pm.distributions.transforms.Interval (or whatever else I’ll need to do).

Example of Data Pre-Processing

# Censored Data Processing

@dataclass
class CensoredData:
    y: List[float]
    lower_limits: List[float]
    upper_limits: List[float]

def define_observed_lower_upper(analysis: List[str]) -> CensoredData:
    y = []
    lower_limits = []
    upper_limits = []

    for value in expo_data:
        if value.startswith('<'):
            # Left-censored
            limit = float(value[1:])  # Extract the numeric value
            y.append(limit)
            lower_limits.append(limit)
            upper_limits.append(np.inf)
        elif value.startswith('>'):
            # Right-censored
            limit = float(value[1:])  # Extract the numeric value
            y.append(limit)
            lower_limits.append(-np.inf)
            upper_limits.append(limit)
        elif '-' in value:
            # Interval-censored 
            #TODO: Test once figured out imputation
            lower, upper = map(float, value.split('-'))  # Extract the range
            y.append((lower + upper) / 2)
            lower_limits.append(lower)
            upper_limits.append(upper)
        else:
            # Uncensored
            y.append(float(value))
            lower_limits.append(-np.inf)
            upper_limits.append(np.inf)
        
    return CensoredData(y, lower_limits, upper_limits)

test_data = ['25.7', '17.1', '168', '85.3', '66.4',  '<49.8',  '33.2', '<24.4', '38.3']
data = define_observed_lower_upper(test_data)

Example of Left Censored Model without Imputation

# Bayesian model with pm.Censored
with pm.Model() as model:
    # Priors
    mu = pm.Uniform("mu", lower=-20, upper=20)
    log_sigma = pm.Normal("log_sigma", mu=-0.1744, sigma= 2.5523)
    sigma = pm.Deterministic("sigma", pm.math.exp(log_sigma))
    precision = pm.Deterministic("precision", 1 / (sigma ** 2))

    # Likelihood with censoring
    censored_observations = pm.Censored(
        "y",
        pm.Lognormal.dist(mu=mu, tau=precision),
        lower= data.lower_limits,
        upper=data.upper_limits,
        observed=data.y,
    )

    # Sampling
    trace = pm.sample(
        draws=25000, 
        tune=5000, 
        chains=4, 
        return_inferencedata=True, 
        target_accept=0.90, 
        initvals={
            "mu": np.log(0.3), 
            "log_sigma": np.log(2.5)
            }
        )

This is (one of) the JAGS models I am converting

 model {
                   
         ##prior on mu
         
         mu ~dunif(",mu.lower,",",mu.upper,")
         
         ##prior on sigma
         
         precision <-1/(sigma*sigma)
         
         sigma <-exp(log.sigma)
         
         log.sigma ~dnorm(",log.sigma.mu,",",log.sigma.prec,")
         
        
         ###likelihood
         
         for (i in 1:n.obs) {
         
         y[i] ~ dlnorm(mu, precision)
         
         CensorType[i] ~ dinterval( y[i] , censorLimitMat[i,] )
       
       }

You can impute after sampling is done by sampling from a truncated version of the distribution with the intervals reversed. If you had a left censored observation you can sample from a right truncated distribution at the same point, and so on.

This can be done with pm.sample_posterior_predictive in a new model.

Thanks for this, was pretty easy to implement. After reviewing results, it looks like the original authors actually just integrated out the censored data, similar to how I assume pm.Censored() works. Good to have that trick up my sleeve in the future though.

I’m having some ‘slight’ differences between the results I am getting with my model using pm.Censored() and what was published utilising R+JAGS, but I’ll get in touch with them first and make another thread if we cannot figure it out.

1 Like

Yes Censored integrates out the logp of censored values