Help me understand pm.Censored()

areding · March 14, 2022, 6:49am

My understanding is that for right-censored data, your observation should be the minimum possible value. Your trial is over, but the machine or whatever still hasn’t failed, so you know the lifetime of that machine is at least what it’s shown so far. But when I look at pm.Censored, it’s using at.clip and returns the upper value if x is greater than the upper value. Could you help me understand how the code works here?

ricardoV94 · March 14, 2022, 7:21am

My understanding is that for right censored data, observations would be capped at the maximum observable value (which is indeed the minimum value they could have taken).

areding · March 14, 2022, 7:53am

Yes, I think you put it more precisely than I did. Observations from your experiment would at maximum be equal to the upper limit. I think where I’m having trouble is understanding how pm.Censored is allowing PyMC to use that upper limit as a lower limit when sampling.

OriolAbril · March 14, 2022, 8:17am

I think the first figure in this notebook (and the whole notebook) can help clarify this: Bayesian regression with truncated or censored data — PyMC documentation.

To generate random draws from a censored distribution, you sample from the uncensored distribution (transparent grey dots in the plot) and then clip those to fulfill the censoring constraint (solid black dots in the plot). This effectively means capping/clipping/reducing the observations above the upper limit to be equal to the upper limit, precisely because it is an upper limit and there can’t be values higher than that. However, even if “reducing” the value of these uncensored draws, the upper limit is not being used as a lower limit.

areding · March 14, 2022, 8:51am

I had seen that notebook, but I should have read it more carefully. Now I understand the clipping part in rv_op is for generating censored observations.

I also saw this part in the censored regression model section:

Behind the scenes, pm.Censored adjusts the likelihood function to take into account that:

the probability at the lower bound is equal to the cumulative distribution function from −∞ to the lower bound,

the probability at the upper bound is equal to the the cumulative distribution function from the upper bound to ∞.

This is what I was clumsily trying to ask about. Is this all accomplished with the get_moment_censored() method, or is there something more in the code that I should look for?

Thank you @OriolAbril and @ricardoV94 for your help!

areding · March 14, 2022, 9:04am

Sorry, I have one more related question. I’m currently working on translating an old Bayesian stats course to PyMC from OpenBUGS. When you use censoring in BUGS, it also gives you imputed values for each censored observation. For example the results might look like this:

|               |            mean | sd     | MC_error | val2.5pc | median  | val97.5pc | start | sample |
|---------------|-----------------|--------|----------|----------|---------|-----------|-------|--------|
| lambda        | 0.02281         | 0.0086 | 3.66E-05 | 0.009229 | 0.02169 | 0.04248   | 1001  | 100000 |
| mu            | 51.07           | 22.61  | 0.1018   | 23.54    | 46.09   | 108.4     | 1001  | 100000 |
| observed[2]   | 122.8           | 60.45  | 0.2442   | 73.12    | 103.5   | 284.5     | 1001  | 100000 |
| observed[4]   | 110.8           | 59.93  | 0.2145   | 61.09    | 91.81   | 269.8     | 1001  | 100000 |
| observed[10]  | 72.01           | 60.28  | 0.2155   | 22.13    | 52.89   | 232.6     | 1001  | 100000 |

Is there a way to do this with pm.Censored(), or would I need to switch to something like this model?

OriolAbril · March 14, 2022, 9:29am

The get_moment method is used to find initial values for sampling, and has no effect on the logp of the distribution. I think that the logp computation for Censored distributions is deferred to aeppl, but I’ll let Ricardo answer this as he’ll be able to do it much better and without having to guess.

As for the 2nd question, it should be a new topic to keep things organized and to make finding the question easier (both to possible answerers who follow imputation but not censoring questions or who might have already read and unsubscribed from this specific discussion) and to people with the same question searching for answers in the future.

ricardoV94 · March 14, 2022, 9:49am

As @OriolAbril mentioned the logp is defined in Aeppl here: aeppl/truncation.py at 751979802f1aef5478fdbf7cc1839df07df60825 · aesara-devs/aeppl · GitHub

For your other question, pm.Censored does not do automatic imputation, as that is less efficient. You could follow the strategy in that last notebook you referred to if you wanted that.

Topic		Replies	Views
Bounding Output and Potential Values Questions	4	1529	June 20, 2018
Issue with the accuracy of reconstructed data for censored data analysis	10	125	June 10, 2024
Help with Censored Regression Questions	23	3607	January 24, 2021
Truncated observed data? Questions	6	2306	November 6, 2020
Sample posterior predictive with censored data Questions	3	522	April 7, 2023

Help me understand pm.Censored()

Related topics