LOO/WAIC for censored survival data

I’m working with weibull distributed survival data and using DensityDist with a custom log-likelihood function to account for censoring - how do I go about calculating WAIC or LOO properly?

I get the following issue when trying to calculate WAIC for instance (and the SE of the estimate I get is 0.0)

There has been a warning during the calculation. Please check the results.

Here’s the model structure


def weibull_censored_logp(k, lambd, E, T):
    # PDF of Weibull: k / lambda * (x / lambda)^(k-1) * exp(-(t / lambda)^k)
    LL_observed = pm.math.log(k) - pm.math.log(lambd) + (k-1)*(pm.math.log(T) - pm.math.log(lambd)) - (T/lambd)**k
    
    # CDF of Weibull: 1 - exp(-(t / lambda)^k)
    # SF (survival fxn) = C * 1-CDF 
    LL_censored = -(T/lambd)**k
    
    # We need to implement the likelihood using pm.Potential 
    logp = E * LL_observed + (1 - E) * LL_censored
    return logp.sum()


with pm.Model() as m1b:
    
    # hyperpriors
    k_mu = pm.Normal("k_mu", -0.806, .113)
    k_sd = pm.TruncatedNormal("k_sd", 3.5, .5, lower=0)
    
    lam_mu = pm.Normal("lam_mu", 1.24, .15)
    lam_sd = pm.TruncatedNormal("lam_sd", 0.7, .1, lower=0)
    
    # style level parameters
    k = pm.Lognormal('k', mu=pm.math.exp(k_mu), sd=k_sd, shape=len(s_labels))  # Weak prior for k
    lambd = pm.Lognormal('lambd', pm.math.exp(lam_mu), lam_sd, shape=len(s_labels))  # Weak prior
    
    # likelihood
    weibull_surv = pm.DensityDist(
        'weibull_surv', 
        logp=weibull_censored_logp, 
        testval=1.0,
        observed={"k":k[s_], "lambd":lambd[s_], "E":E, "T":T}
    )

The effective sample size and rhat of parameters are quite good, and the model fits fairly quick so I dont think there are any major misspecifications

Any ideas on calculating WAIC/LOO properly?

Some good news - it turns out that by having this line in my custom log likelihood function

return logp.sum()

It was summing the log probabilities across all observations. It seems that typically pymc3 returns a log likelihood distribution for every y-value, so if your dataset is N=10000, you’d have 10000 log likelihood distributions. The solution was to remove the sum operation so it was instead the following:

return logp

I noticed on this PyMC3 doc page that the exponential survival function is recommended to have the logp.sum() operation as shown below

def logp(failure, value):
    return (failure * log(λ) - λ * value).sum()

exp_surv = pm.DensityDist('exp_surv', logp, observed={'failure':failure, 'value':t})

Is there a reason for this? Is it just to speed things up?

My remaining question is the following:
Is WAIC valid for this hierarchical censored survival model? I’m now getting warnings that some log predictive densities are > 0.4