Bayesian Survival Function

This is a crosspost:

I have managed to estimate the posterior of the latent variables of my model which can be stated as follows (adapted from

y_i = \beta^T x_i + \xi_i

where y_i\equiv \log T_i, is the log duration and,

\xi_i \sim Gumbel(0, s) \\ s \sim HalfNormal(5)

Note that this can be equivalently stated that y_i\sim Gumbel(\beta^T x_i, s).

Therefore, the likelihood of the dataset for both censored and uncensored data is:

\log p\left(\cup_{i=1}^N y_i|\beta, s\right) = \sum_{i \in uncensored} \log p\left(y_i=\log T_i|\beta, s\right) + \sum_{i \in censored} \log p\left(y_i>\log T_i|\beta, s\right)

The question is regarding getting a Bayesian estimate of the survival function. The author of the above notebook, samples from the posterior predictive distribution of y using:

with weibull_model:
    post_trace = pm.sample_posterior_predictive(trace, vars=[y_obs])

and uses the samples of y (converted back to time domain) to generate the survival function with the following logic (for time steps 1 to 200):

for t in range(1, 201):
    frac = sum(t_samples > t) / len(t_samples)
    survival_func[t] = frac

It makes logical sense to state that the fraction of samples greater than a particular time is the survival rate. However, this is a point estimate of the survival function.

There’s two ways I can think of getting a ‘Bayesian’ estimate of the Survival Function:

  1. Sample N number of traces (with say 1000 points) of the posterior predictive distribution and do:
for n in range(N):
    for t in range(1, 201):
        frac = sum(t_samples[n] > t) / len(t_samples[n])
        survival_func[N, t] = frac
  1. We know the CDF of the gumbel distribution given the latent variables to be, \exp\left(-\exp\left(\left(\log T_i - X_i^T\beta\right)/s\right)\right). Hence we can let the j-th sample for the survival function for a given value of T to be 1 - \exp\left(-\exp\left(\left(\log T_i - X_i^T\beta^{(j)}\right)/s^{(j)}\right)\right) where \beta^{(j)}, s^{(j)} are the samples from the posterior of the latent variables.

Are the two methods equivalent? If not which method is the more correct way of getting a Bayesian sample of the Survival Function.


I feel like I should point out after @Gon_F’s answer that in method 2. What I am calculating is the expectation of \exp\left(-\exp\left(\left(\log T - X^T\beta\right)/s\right)\right) under the distribution p(\beta, s| \mathcal{D}_{train}) where \mathcal{D} is the training dataset. Whereas, the first method is looking directly at the distribution, p(T| \mathcal{D}_{train}) where, \beta, s has already been marginalised out. So I’m wondering if (1) is an approximation to (2) after all. Hopefully I’m not complicating things more.


I love this question.

The way it is written, (1) is a Monte Carlo approximation to (2). In order to sample from the density in (2) you would:

(i) Sample \beta^{(j)}, s^{(j)} from P_\mathrm{post}(\beta, s)

(ii) Sample y_i^{(j)} from \mathrm{Gum}(x_i^T\beta^{(j)}, s)

(iii) Take T_i^{(j)} = \exp(y_i^{(j)})

This is precisely the content of the trace for y_i.


If we do find an effective way to do Survival functions, it would open a path to implementing joint models, like Stan has, which allow modelling evolving Time-to-Event’s (TTE’s) over time, which I had been looking at recently.

Maybe @AustinRochford, can comment on this

To be clear set \mathcal{D}_{train} = (X, y); and note that y only influences the predictive posterior p(T|X, y) through its influence on the posteriors p(\beta, x | X, y):

p(T|X, y) = \int_{\beta, s} p(T|\beta, s, X)p(\beta, s|X, y)d\beta ds


p(T|D) = \int_{\beta ,s} p(T|\beta, s, D)p(\beta, s|D)d\beta ds

So sampling from p(\beta, s|D) and then from p(T|\beta, s) and computing computing \frac{1}{K}\sum_{i=1}^kT_i or \frac{\sum_{i=1}^K T_i P(T_i|\beta_i,s_iX)p(\beta_i, s_i | D)}{\sum p(\beta_i, s_i|D)} is a monte-carlo approximation to the (numerical) expectation \int T p(T|D) d T

1 Like