Hi, thank you. Yes, sorry about that. When reading the paper I felt a similar confusion, as it does not seem like a censored approach ‘proper’. But it is (partially, I guess) a standard in epidemiology to estimate incubation periods in this way. The main issue is that there are not unique lower and upper boundaries. Instead, every observation has its own boundaries. The common case is that each patient has a possible lower and upper boundary of an hypothetical/estimated exposure period, where incubation is time of symptoms onset minus time of exposure. So we have, for instance, minimum exposure moment (minExp) and maximum exposure moment (maxExp) and symptoms onset (onset) So the point is estimating a ‘true’ value (often called ‘exact’) of the incubation period (i.e. that lies between onset - minExp and onset - maxExp).
The approach in the paper is by sampling from the CDFs of Weibull distributions. You can see that in the linked paper and R code. You can see the data and code in the second link of my initial post. But the key bit of their code is this one (I guess):
# prior function
logprior = function(current){
current$logprior=dunif(current$k,0,100,log=TRUE)+dunif(current$theta,0,100,log=T)
current
}
# The likelihood function
loglikelihood = function(current,data){
current$loglikelihood <- log(pweibull(data$IncP_max,current$k,current$theta)-pweibull(data$IncP_min,current$k,current$theta))
current
}
Which is what I’m trying to translate to PyMC. They sample it with Metropolis, I’d prefer to use HMC.
I have checked the implementation of the PyMC Censored and multiple posts on the topic before posting. But it seems that the data I’m working with is not matching the requirements. PyMC censored requires access to the censored data (e.g. a list of values) and two absolute boundaries (upper and lower). While in the data I have corresponds to two lists of variable boundaries (395 lower boundaries, 395 upper boundaries).