Imputing Censored Data with Unqiue Censor Points

sebyon · December 27, 2024, 8:51am

Hello all,

I am attempting to re-create JAG models from an existing R project into PyMC. However, I am hitting a snag with censored data imputation. (Paper here if bored)

I need to be able to impute primarily left, and occasionally interval and right censored data within a small dataset that may have different censoring points per measurement. For example, results may come from different laboratories with different Limits of Quantification (LOQ).

The model is relatively simple. For example

# Example of data with left, right and interval censoring

data = ['28.9', '<19.4', '<5.5', '>149.9', '26.42', '56.1', '[25.6-50.2]']

I can follow the example for censored data here, however only for the non-imputated section. I found that you can pass an array for lower/upper for pm.Censored, but I can figure out how to get it working for pm.distributions.transforms.Interval (or whatever else I’ll need to do).

Example of Data Pre-Processing

# Censored Data Processing

@dataclass
class CensoredData:
    y: List[float]
    lower_limits: List[float]
    upper_limits: List[float]

def define_observed_lower_upper(analysis: List[str]) -> CensoredData:
    y = []
    lower_limits = []
    upper_limits = []

    for value in expo_data:
        if value.startswith('<'):
            # Left-censored
            limit = float(value[1:])  # Extract the numeric value
            y.append(limit)
            lower_limits.append(limit)
            upper_limits.append(np.inf)
        elif value.startswith('>'):
            # Right-censored
            limit = float(value[1:])  # Extract the numeric value
            y.append(limit)
            lower_limits.append(-np.inf)
            upper_limits.append(limit)
        elif '-' in value:
            # Interval-censored 
            #TODO: Test once figured out imputation
            lower, upper = map(float, value.split('-'))  # Extract the range
            y.append((lower + upper) / 2)
            lower_limits.append(lower)
            upper_limits.append(upper)
        else:
            # Uncensored
            y.append(float(value))
            lower_limits.append(-np.inf)
            upper_limits.append(np.inf)
        
    return CensoredData(y, lower_limits, upper_limits)

test_data = ['25.7', '17.1', '168', '85.3', '66.4',  '<49.8',  '33.2', '<24.4', '38.3']
data = define_observed_lower_upper(test_data)

Example of Left Censored Model without Imputation

# Bayesian model with pm.Censored
with pm.Model() as model:
    # Priors
    mu = pm.Uniform("mu", lower=-20, upper=20)
    log_sigma = pm.Normal("log_sigma", mu=-0.1744, sigma= 2.5523)
    sigma = pm.Deterministic("sigma", pm.math.exp(log_sigma))
    precision = pm.Deterministic("precision", 1 / (sigma ** 2))

    # Likelihood with censoring
    censored_observations = pm.Censored(
        "y",
        pm.Lognormal.dist(mu=mu, tau=precision),
        lower= data.lower_limits,
        upper=data.upper_limits,
        observed=data.y,
    )

    # Sampling
    trace = pm.sample(
        draws=25000, 
        tune=5000, 
        chains=4, 
        return_inferencedata=True, 
        target_accept=0.90, 
        initvals={
            "mu": np.log(0.3), 
            "log_sigma": np.log(2.5)
            }
        )

This is (one of) the JAGS models I am converting

 model {
                   
         ##prior on mu
         
         mu ~dunif(",mu.lower,",",mu.upper,")
         
         ##prior on sigma
         
         precision <-1/(sigma*sigma)
         
         sigma <-exp(log.sigma)
         
         log.sigma ~dnorm(",log.sigma.mu,",",log.sigma.prec,")
         
        
         ###likelihood
         
         for (i in 1:n.obs) {
         
         y[i] ~ dlnorm(mu, precision)
         
         CensorType[i] ~ dinterval( y[i] , censorLimitMat[i,] )
       
       }

ricardoV94 · December 27, 2024, 8:47pm

You can impute after sampling is done by sampling from a truncated version of the distribution with the intervals reversed. If you had a left censored observation you can sample from a right truncated distribution at the same point, and so on.

This can be done with pm.sample_posterior_predictive in a new model.

sebyon · December 29, 2024, 10:34am

Thanks for this, was pretty easy to implement. After reviewing results, it looks like the original authors actually just integrated out the censored data, similar to how I assume pm.Censored() works. Good to have that trick up my sleeve in the future though.

I’m having some ‘slight’ differences between the results I am getting with my model using pm.Censored() and what was published utilising R+JAGS, but I’ll get in touch with them first and make another thread if we cannot figure it out.

ricardoV94 · December 29, 2024, 12:27pm

Yes Censored integrates out the logp of censored values

Topic		Replies	Views
Help with Censored Regression Questions	23	3602	January 24, 2021
Generating censored data from posterior distribution	4	230	June 10, 2024
Dealing with imputation of left censored multivariate normal distribution Questions	7	1588	June 20, 2018
Help me understand pm.Censored() v5	7	1214	March 14, 2022
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	385	August 30, 2022

Imputing Censored Data with Unqiue Censor Points

Related topics