Help me understand pm.Censored()

My understanding is that for right-censored data, your observation should be the minimum possible value. Your trial is over, but the machine or whatever still hasn’t failed, so you know the lifetime of that machine is at least what it’s shown so far. But when I look at pm.Censored, it’s using at.clip and returns the upper value if x is greater than the upper value. Could you help me understand how the code works here?

My understanding is that for right censored data, observations would be capped at the maximum observable value (which is indeed the minimum value they could have taken).

Yes, I think you put it more precisely than I did. Observations from your experiment would at maximum be equal to the upper limit. I think where I’m having trouble is understanding how pm.Censored is allowing PyMC to use that upper limit as a lower limit when sampling.

I think the first figure in this notebook (and the whole notebook) can help clarify this: Bayesian regression with truncated or censored data — PyMC documentation.

To generate random draws from a censored distribution, you sample from the uncensored distribution (transparent grey dots in the plot) and then clip those to fulfill the censoring constraint (solid black dots in the plot). This effectively means capping/clipping/reducing the observations above the upper limit to be equal to the upper limit, precisely because it is an upper limit and there can’t be values higher than that. However, even if “reducing” the value of these uncensored draws, the upper limit is not being used as a lower limit.

I had seen that notebook, but I should have read it more carefully. Now I understand the clipping part in rv_op is for generating censored observations.

I also saw this part in the censored regression model section:

Behind the scenes, pm.Censored adjusts the likelihood function to take into account that:

  • the probability at the lower bound is equal to the cumulative distribution function from −∞ to the lower bound,
  • the probability at the upper bound is equal to the the cumulative distribution function from the upper bound to ∞.

This is what I was clumsily trying to ask about. Is this all accomplished with the get_moment_censored() method, or is there something more in the code that I should look for?

Thank you @OriolAbril and @ricardoV94 for your help!

1 Like

Sorry, I have one more related question. I’m currently working on translating an old Bayesian stats course to PyMC from OpenBUGS. When you use censoring in BUGS, it also gives you imputed values for each censored observation. For example the results might look like this:

|               |            mean | sd     | MC_error | val2.5pc | median  | val97.5pc | start | sample |
|---------------|-----------------|--------|----------|----------|---------|-----------|-------|--------|
| lambda        | 0.02281         | 0.0086 | 3.66E-05 | 0.009229 | 0.02169 | 0.04248   | 1001  | 100000 |
| mu            | 51.07           | 22.61  | 0.1018   | 23.54    | 46.09   | 108.4     | 1001  | 100000 |
| observed[2]   | 122.8           | 60.45  | 0.2442   | 73.12    | 103.5   | 284.5     | 1001  | 100000 |
| observed[4]   | 110.8           | 59.93  | 0.2145   | 61.09    | 91.81   | 269.8     | 1001  | 100000 |
| observed[10]  | 72.01           | 60.28  | 0.2155   | 22.13    | 52.89   | 232.6     | 1001  | 100000 |

Is there a way to do this with pm.Censored(), or would I need to switch to something like this model?

The get_moment method is used to find initial values for sampling, and has no effect on the logp of the distribution. I think that the logp computation for Censored distributions is deferred to aeppl, but I’ll let Ricardo answer this as he’ll be able to do it much better and without having to guess.

As for the 2nd question, it should be a new topic to keep things organized and to make finding the question easier (both to possible answerers who follow imputation but not censoring questions or who might have already read and unsubscribed from this specific discussion) and to people with the same question searching for answers in the future.

1 Like

As @OriolAbril mentioned the logp is defined in Aeppl here: aeppl/truncation.py at 751979802f1aef5478fdbf7cc1839df07df60825 · aesara-devs/aeppl · GitHub

For your other question, pm.Censored does not do automatic imputation, as that is less efficient. You could follow the strategy in that last notebook you referred to if you wanted that.

1 Like