I’ve been playing around with PYMC, and I wanted to ask a beginner’s question, specific to an example on your website. If anyone is already familiar with the example Bayesian Survival Analysis — PyMC example gallery , then maybe someone can quickly help me with my question.
In the notebook, a variable called deaths codes for an observed death at a given time period, the are 0 for no death, and 1 for a death. Then, the death is approximated with a pm.Poisson("obs", mu, observed=death) random variable. Since the variable death is 0 or 1, to me it seems more logical to model deaths, using a pm.Bernoulli. If one assumes the same exponential rate mu, then the probability of death in the given exposure window would be given by p=1-e^{-mu}.
However, if I replace the line obs = pm.Poisson("obs", mu, observed=death) with obs = pm.Bernoulli("obs", p=1-T.exp(-mu), observed=death), the sampling leads to all divergences. Is there someone who knows why, or where I can find the information I need to solve this problem? Otherwise, if my approach does not work, is there someone who can explain to me, why a Poisson distribution should be used?
Good questions. There is both a Poisson and binomial approximation to the Cox model. Because they are rare events (death) it turns out not to matter which you use here.
The reason the binomial is not working for you is that you have not constrained the probability to be on the unit interval. It will be a logit transformation, rather than a log.
Thanks for the answer; I appreciate it. Am I correct to say that 1-T.exp(-mu) is always going to be between 0 and 1 in the example? To me, it looks like mu has to be nonnegative. Is there a place, where I can read more about how I could do this; or why I have to use a logit transfo, rather than a log? I like the Poisson approximation, but I would not really be able to convince myself easily that death after mastectomy in patients with metastasized breast cancer is a rare event. Again, thanks for your answer!
You are probably right in this case for the transformation, so would have to take a peek and see why it’s failing.
As far as rare events go, it’s rare in the sense that it’s happening either once or never, so a Poisson mean would be small. While binomial (Bernoulli) may seem more natural, the Cox survival model is directly related to Poisson regression through a clever restructuring of the survival data. So if you want to use a Bernoulli, it’s not just a matter of swapping it in for the Poisson. It will be a different structuring of the data.Its a proportional odds (rather than proportional hazards) model and is better suited for discrete time than continuous time.
Something to think about! It’s more complicated that I thought, then. I’m still curious to find out why it doesn’t work, since I don’t really get that yet. Thank you for your feedback
Because sampling doesn’t require normalization, if you have a Poisson truncated to just 0 and 1 observations, it’s going to be fine for analysis (not for generation, so it’s suspect). This works out to
Oh! That is simple and quite clever. So a pm.Poisson and a pm.Bernoulli are equilvant (up to an approximation) if all observations are 0 or 1. Thanks for that insight. Do you maybe also know why it didn’t work using pm.Bernoulli?
BUGS and JAGS would use this to sneak in arbitrary log densities. If you write 0 ~ poisson(-lambda) then it adds lambda to the log density. Here’s a section of a PyMC tutorial exploring that idea using a survival model as an example.
You have to be careful here: in general they are not.
The Poisson approximation to the cox-ph model works because of the definition of the partial likelihood function. We can basically ‘add’ a bunch of Poisson terms that correspond to no event (in the augmented dataset where we create a bunch of psudeo observations for each individual and interval within our observations) and the Poisson process likelihood function (the means are a function of the time and hazard ratio within each interval and individual pair) becomes a very good discrete approximation to the cox ph model: Statistics and Population (note, this course assumes you’re not taking a bayesian approach, but the the logic follows in the derivation of the PL function)
For the bernoulli approximation; i’m not sure how it’s usually done outside of some non-parametric approaches that have a proportional odds term. For those approaches that do not, generally the time of observation is regressed on the covariate and then this latent variable is passed to a probit regression or whatever.
Thanks for the link, I appreciate the help and I’m going through the material If someone knows why the divergences appear, that would still be very informative to me.