Hello all,
I am fairly new to pymc and probabilistic programming in general. I’ve been trying to use MCMC for the treatment of dataset with left-censored data and found this very nice example.
I’ve run some tests with the model and I get good results, but I have troubles extracting samples from the left censored posterior distribution. The example says that the imputed censored model is capable of generating data sets of censored values (sample from the posteriors of left_censored
and right_censored
to generate them), however when sampling from the posteriors I get data from the entire distribution and not only the censored values. A minimal repro is below. What I am doing wrong?
I also have a more general question related to this topic: for a simple distribution like in this case, is there a reason for using the sample_posterior_predictive
function instead of simply drawing random samples from a newly defined distribution with the parameters (sigma and mu) obtained from the MCMC simulation?
Thanks a lot already,
best regards
Minimal Repro
import numpy as np
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as snsDefine input data
N_data = 150
mu_th = 1.5
sigma_th = 0.8
Q = 0.5Generate random data
Y_data = np.random.normal(loc=mu_th, scale=sigma_th, size=N_data)
Censor data
RL = np.quantile(Y_data, Q)
Y_obs = Y_data[Y_data>RL]
N_cen = len(Y_data[Y_data<=RL])
N_obs = len(Y_obs)with pm.Model() as imputed_censored_model:
mu = pm.Normal(“mu”, mu = np.mean(Y_obs), sigma = 1)
sigma = pm.LogNormal(“sigma”, mu = np.std(Y_obs), sigma = 1)left_censored = pm.Normal("left_censored", mu,sigma, transform=pm.distributions.transforms.Interval(None, RL), shape=N_cen, initval=np.full(N_cen, RL - 1)) observed = pm.Normal("obs", mu=mu, sigma=sigma, observed=Y_obs, shape=N_obs) # Sampling trace = pm.sample(1000, return_inferencedata=True) # Posterior predictive posterior_predictive_censored = pm.sample_posterior_predictive(trace, var_names = ['left_censored']) posterior_predictive_obs = pm.sample_posterior_predictive(trace, var_names = ['obs'])
Extract values
ppc_val = posterior_predictive_censored.posterior_predictive[‘left_censored’].values.flatten()
ppo_val = posterior_predictive_obs.posterior_predictive[‘obs’].values.flatten()Figure
fig, axs = plt.subplots()
sns.kdeplot(ppc_val, ax = axs, label = ‘left censored’)
sns.kdeplot(ppo_val, ax = axs, ls = ‘–’, label = ‘obs’)
axs.legend()