Hi!
I’m trying to build a statistical model for the following scenario.
There are N machines which can stop working at some point in time T. The machines are checked on some schedule if they work or not. When they are checked the time of check Tc and the state of the machine (0-working, 1 not-working) are recorded. There is one complication: with a probability of p the sensor measuring the state may fail itself and in that case the observed state is always 0.
I need to estimate T and p from the data. For simplicity, let’s assume that each machine is checked only once to not deal with measurements of each individual subject.
This problem is interval-censored, I never observe the true value of T, it is either left-censored (in case of failure = 1) or right-censored (in case of failure = 0):
df = df_simulated.assign(
lower=lambda df: np.where(df["observed_failure"] == 1, df["check_day"], 0),
upper=lambda df: np.where(df["observed_failure"] == 0, df["check_day"], np.inf),
)
I fit it like this:
with pm.Model() as m2:
α = pm.Exponential('α', 1)
β = pm.Exponential('β', 1)
time_to_event = pm.Weibull.dist(alpha = α, beta = β)
obs = pm.Censored('obs', dist = time_to_event, upper=df.upper, lower=df.lower, observed = df.check_day)
And it works beautifully for p = 0
, I can recover the predefined failure rate in the simulation.
For p > 0
the model starts to over-estimate the rate as it observes more of “non-failures” than it should.
Any help would be great on how to add the hurdle parameter to the model. I understand that I need to modify the log-likelihood somehow, but my problem that I have two observed parameters: failure and the time. Should I use pm.Potential
or there are a cleaner way?