Yesterday I uploaded a demo notebook of Truncated Regression (Example notebook for truncated regression) and I am trying to extend this to Censored Regression (aka Tobit Regression). So this is similar to this unanswered question Regression with censored response variable.
I’ve learnt a bit from this notebook https://docs.pymc.io/notebooks/censored_data.html which presents a simple example of estimation of a mean and sd of 1D censored data. It presents a model which imputes the values of censored data…
# Imputed censored model n_right_censored = len(samples[samples >= high]) n_left_censored = len(samples[samples <= low]) n_observed = len(samples) - n_right_censored - n_left_censored with pm.Model() as imputed_censored_model: mu = pm.Normal('mu', mu=0., sigma=(high - low) / 2.) sigma = pm.HalfNormal('sigma', sigma=(high - low) / 2.) right_censored = pm.Bound(pm.Normal, lower=high)( 'right_censored', mu=mu, sigma=sigma, shape=n_right_censored ) left_censored = pm.Bound(pm.Normal, upper=low)( 'left_censored', mu=mu, sigma=sigma, shape=n_left_censored ) observed = pm.Normal( 'observed', mu=mu, sigma=sigma, observed=censored, shape=n_observed )
Although note that the observed data in that model is in fact truncated data (where the points outside the bounds are removed), not censored data.
Presumably for a regression context you could modify this approach to set
mu as a function of
So referring to the example figure below, would a sensible approach be to:
sdfrom truncated data (ie. data within the censor bounds)
- Split the censored data up into left and right sets (ie
- Impute their y values as in the example notebook above and code snippet BUT where
muis a function of the x coordinates of the censored data.
- Does this approach sound reasonable?
- Can anyone explain if and why we should be using
TruncatedNormalin this context of censored data?
- In the code example above (https://docs.pymc.io/notebooks/censored_data.html) I can’t work out why the imputation of
left_censoredmakes any difference to the estimate of
sigma(it does, I’ve checked). Can anyone explain how that works?