Weibull Survival Regression AFT

bacalfa · April 19, 2018, 1:19pm

I came across this tutorial on accelerated time failure survival regression modeling: http://austinrochford.com/posts/2017-10-02-bayes-param-survival.html. I’m interested in the Weibull part.

It’s not clear to me how to model the case with an intercept. See http://personal.psu.edu/drh20/525/weekly/weibull.pdf. The regression model seems to be this one:

\log (T) = \mu + \sigma W + \gamma^\top X

I’m ultimately interested in the Weibull parameters \beta = 1/\sigma (the inverse of the scale) and \eta = \exp(\mu) (the exponential of the intercept). In addition, if I have no predictors, i.e., X = 0, how can I capture that in the Gumbel distribution (not forgetting the intercept \mu)?

Here’s some code, but I don’t think it’s correct:

import pymc3 as pm
from theano import shared, tensor as tt

y = np.log(time)
y_std = (y - y.mean()) / y.std() # Not sure if I need this because of the intercept mu
cens = event == 0

# cens_ = shared(cens) # There are no predictors

def gumbel_sf(y, mu, sigma):
    return 1.0 - tt.exp(-tt.exp(-(y - mu) / sigma))

# Build Bayesian model
with pm.Model() as model:
    # http://personal.psu.edu/drh20/525/weekly/weibull.pdf
    # log(T) = mu + sigmaW + gamma^\top z
    # where gamma = 0 (no predictors), mu ~ N(0, 10), and sigmaW ~ Gumbel(0, s) with s ~ HalfNormal(5)

    # Hyperprior
    s = pm.HalfNormal("s", tau=5.0)

    # Priors
    mu = pm.Normal("mu", mu=0, sd=10)

    # Likelihood for uncensored and censored survival times
    y_obs = pm.Gumbel("y_obs", mu=mu, beta=s, observed=y_std[~cens])
    y_cens = pm.Bernoulli("y_cens", p=gumbel_sf(y_std[cens], mu=mu, sigma=s), observed=np.ones(cens.sum()))

    # Initialization
    start = pm.find_MAP()

# Perform MC sampling
with model:
    trace = pm.sample(draws=2000, start=start, tune=1000)

How do I obtain \beta and \eta after the sampling is done?

junpenglao · April 19, 2018, 3:37pm

The intercept is already included in the original model: see the matrix of covariates X contains a columns of 1s.

In your code above, since the linear regression model only contains a intercept mu, you can compute the eta = np.exp(trace['mu']) and beta = 1/trace['s']

bacalfa · April 19, 2018, 4:45pm

Thanks for the observation about the X array.

By the way, I had to modify the code because I was getting an error in Theano (indexing with bool mask).

import pymc3 as pm
from theano import shared, tensor as tt

y = np.log(time)
y_std = (y - y.mean()) / y.std()
event_ = shared(event)
X = np.ones((len(time), 1))

X_ = shared(X)

def gumbel_sf(y, mu, sigma):
    return 1.0 - tt.exp(-tt.exp(-(y - mu) / sigma))

# Build Bayesian model
with pm.Model() as model:
    # http://personal.psu.edu/drh20/525/weekly/weibull.pdf
    # log(T) = sigmaW + gamma^\top X
    # where gamma = 1 (intercept only) and sigmaW ~ Gumbel(0, s) with s ~ HalfNormal(5)

    # Hyperprior
    s = pm.HalfNormal("s", tau=5.0)

    # Priors
    gamma = pm.Normal("gamma", 0., 5.0, shape=X.shape[1])
    gammaX = gamma.dot(X_.T)

    # Likelihood for uncensored and censored survival times
    y_obs = pm.Gumbel("y_obs", mu=gammaX[(event == 1).nonzero()], beta=s, observed=y_std[~cens])
    y_cens = pm.Bernoulli("y_cens", p=gumbel_sf(y_std[cens], mu=gammaX[(event == 0).nonzero()], sigma=s), observed=np.ones(cens.sum()))

    # Initialization
    start = pm.find_MAP()

# Perform MC sampling
with model:
    trace = pm.sample(draws=2000, start=start, tune=1000)

Regarding the computation of \beta, s is the hyperprior of one of the hyperparameters of a Gumbel distribution. So I don’t think \beta = 1/s is correct. It looks like \sigma \sim Gumbel(0, s) and then \beta = 1/\sigma. Any comments?

Also, I’m getting very low values for \eta. If I fit the model in R with survreg, I get very different values. Here’s the R code:

time <-c(59, 115, 156, 421, 431, 448, 464, 475, 477, 563, 638, 744, 769, 770, 803, 855, 1040, 1106, 1129, 1206, 1227, 268, 329, 353, 365, 377)
event <- c(1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0)
library(survival)
r <- survreg(Surv(time, event) ~ 1, dist="weibull")
beta <- 1/r$scale
eta <- exp(r$coefficients[1])

And I get \beta = 1.10806 and \eta = 1225.419 . The PyMC3 code above gives me an average gamma of -0.185689, and thus eta = np.exp(trace["gamma"]) gives values nowhere near 1225.419.

junpenglao · April 20, 2018, 4:42pm

I don’t think the model you are implementing here is exactly the same as the one in R.

bacalfa · April 20, 2018, 6:36pm

Oh… I thought that original article was supposed to show how one would implement survival regression using the AFT approach. This is what R’s survreg function does. And you can select the distribution for the time data (in this case, I’m interested in Weibull). If that’s not the correct approach, what would it be then? Maybe @AustinRochford could chime in?

Topic		Replies	Views
Limiting time on Survival Analysis Questions	0	496	June 14, 2019
Bayesian Survival Function Questions	4	610	April 23, 2019
Weibull AFT Fitting Questions	0	610	October 12, 2020
Modeling Weibull Proportional Hazard version agnostic modeling	1	593	April 9, 2022
How to use pymc3 for survival analysis? Questions	6	2017	August 5, 2021

Weibull Survival Regression AFT

Related topics