Some good news - it turns out that by having this line in my custom log likelihood function
return logp.sum()
It was summing the log probabilities across all observations. It seems that typically pymc3 returns a log likelihood distribution for every y-value, so if your dataset is N=10000, you’d have 10000 log likelihood distributions. The solution was to remove the sum operation so it was instead the following:
return logp
I noticed on this PyMC3 doc page that the exponential survival function is recommended to have the logp.sum() operation as shown below
def logp(failure, value):
return (failure * log(λ) - λ * value).sum()
exp_surv = pm.DensityDist('exp_surv', logp, observed={'failure':failure, 'value':t})
Is there a reason for this? Is it just to speed things up?
My remaining question is the following:
Is WAIC valid for this hierarchical censored survival model? I’m now getting warnings that some log predictive densities are > 0.4