Generating censored data from posterior distribution

Baccandr · May 28, 2024, 8:48pm

Hello all,
I am fairly new to pymc and probabilistic programming in general. I’ve been trying to use MCMC for the treatment of dataset with left-censored data and found this very nice example.

I’ve run some tests with the model and I get good results, but I have troubles extracting samples from the left censored posterior distribution. The example says that the imputed censored model is capable of generating data sets of censored values (sample from the posteriors of left_censored and right_censored to generate them), however when sampling from the posteriors I get data from the entire distribution and not only the censored values. A minimal repro is below. What I am doing wrong?

I also have a more general question related to this topic: for a simple distribution like in this case, is there a reason for using the sample_posterior_predictive function instead of simply drawing random samples from a newly defined distribution with the parameters (sigma and mu) obtained from the MCMC simulation?

Thanks a lot already,
best regards

Minimal Repro

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as sns

Define input data

N_data = 150
mu_th = 1.5
sigma_th = 0.8
Q = 0.5

Generate random data

Y_data = np.random.normal(loc=mu_th, scale=sigma_th, size=N_data)

Censor data

RL = np.quantile(Y_data, Q)
Y_obs = Y_data[Y_data>RL]
N_cen = len(Y_data[Y_data<=RL])
N_obs = len(Y_obs)

with pm.Model() as imputed_censored_model:
mu = pm.Normal(“mu”, mu = np.mean(Y_obs), sigma = 1)
sigma = pm.LogNormal(“sigma”, mu = np.std(Y_obs), sigma = 1)
left_censored = pm.Normal("left_censored", mu,sigma,
transform=pm.distributions.transforms.Interval(None, RL),
shape=N_cen, initval=np.full(N_cen, RL - 1))

observed = pm.Normal("obs", mu=mu, sigma=sigma, observed=Y_obs, shape=N_obs)
# Sampling
trace = pm.sample(1000, return_inferencedata=True)

# Posterior predictive
posterior_predictive_censored = pm.sample_posterior_predictive(trace, var_names = ['left_censored'])
posterior_predictive_obs = pm.sample_posterior_predictive(trace, var_names = ['obs'])
Extract values

ppc_val = posterior_predictive_censored.posterior_predictive[‘left_censored’].values.flatten()
ppo_val = posterior_predictive_obs.posterior_predictive[‘obs’].values.flatten()

Figure

fig, axs = plt.subplots()
sns.kdeplot(ppc_val, ax = axs, label = ‘left censored’)
sns.kdeplot(ppo_val, ax = axs, ls = ‘–’, label = ‘obs’)
axs.legend()

ricardoV94 · May 28, 2024, 10:28pm

I think you actually have Truncated data not Censored. In any case you could try to use pm.Censored or pm.Truncated

Baccandr · May 29, 2024, 10:59am

Thank you Ricardo,
yes you are right, in the example the data are truncated but in my real application I am working with censored data. In this model I am removing the censored data but using the information on their number which would not be available with a truncated dataset.
I’ve used pm.Censored and that works fine but I wanted to use an imputed censored model to be able to generate plausible sets of censored data. From my understanding that’s not possible with pm.Censored and it’s also explained in the example I’ve linked above.

So, if you have ideas regarding my question above that would be great.

ricardoV94 · May 29, 2024, 5:45pm

You can impute during sample_posterior_predictive as demonstrated in this blogpost: Out of model predictions with PyMC - PyMC Labs

See section “Predicting uncensored variables”. You basically add a new variable equivalent to the “dist” of the Censored. Should be equivalent to imputing with mcmc during pm.sample, but faster and easier to write down.

Baccandr · June 10, 2024, 10:36am

Thank you very much Ricardo and sorry for my slow answer.
The blogpost is very clear and I was able to implement the solution you suggested.
Unfortunately, I’m not getting the results I expected but I’ve asked this in a separate thread.

Topic		Replies	Views
Sample posterior predictive with censored data Questions	3	522	April 7, 2023
Issue with the accuracy of reconstructed data for censored data analysis	10	125	June 10, 2024
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	387	August 30, 2022
Dealing with imputation of left censored multivariate normal distribution Questions	7	1589	June 20, 2018
Is there a way to generate synthetic data version agnostic modeling	4	994	July 15, 2022

Generating censored data from posterior distribution

Minimal Repro

Define input data

Generate random data

Censor data

Extract values

Figure

Related topics