Handling imprecise travel time data in a model

So I am working with a data set involving fire truck arrival times. The time an alarm was sounded is precisely know as that is recorded by the emergency services system. The arrival times do not appear to be as every arrival time recorded is a perfectly exact 5, 6, 10, 15, minutes from alarm received time. I would like to introduce an error term to handle this as I do not believe I have sufficient information in my view to estimate this as a censored variable. Has anyone encountered a similar problem and have advice on what sort of error term to introduce?

What sort of model are you using (or planning on using)? Most probabilistic models model means and assume that observed data is only a noisy version of the model’s mean. Is that sufficient? Do you have further information to model the error itself (e.g., the fact that the arrival time was recorded as 10 minutes post-alarm but it may have been 9.2 minutes)?

Planning on modeling fire engine arrival times using a failure-time model.

No information on error. Just an alarm time, an arrival time, and various other data that would not inform the error.

Ah, so that’s a bit of the other way around from what I was thinking (naively). My suggestion would be to begin with simulation studies and parameter recovery. So as to not waste time duplicating simulation and model code, I would suggest using the technique outlined in this blog post.

Thank you, ill take a look

You’re looking for what’s called a “measurement error model.” You have to know how the times are recorded or model it in some way. Are they rounded to the nearest minute or rounded up or down? Once you know that, the simplest thing to do is treat the true time as an unknown parameter with a uniform distribution among the times that would produce the discretized time.

For example, if you see an observation of y_n = 7, and the discretization is by rounding up, then you introduce y^\text{true}_n \sim \text{uniform}([7, 8)) and use y^\text{true}_n wherever you would’ve used y_n had it been measured exactly. This gives you inference for the true values as well as whatever other parameters you care about.

An alternative is to use the cdf and do the integration explicitly rather than leaving it to MCMC, but that’s usually a challenge unless the data are conditionally independent. It may also seem like this is introducing a lot of new parameters, but HMC is good with high dimensions.

1 Like