I am trying to implement a white paper:
1-s2.0-S0377221718309159-main copy.pdf (2.9 MB)
Cox PHM looks like this:
Where Lamda_D is the hazard rate for default.
r_D is the baseline hazard rate for default
Theta_D and X_D are regression parameters and covariates
In this paper they have generated the likelihood function like this :
t_d,t_p are your realization of Random Variable T_d and T_p respectively.
I am trying to convert this problem into PYMC world and haven’t been able to implement it correctly yet.
I have an input dataset where each row is a mortgage, set of covariates and time_to_default/prepayment (depends on the cause of failure). I have tried the following:
def standardize(x, mu, sigma): return (x - mu) / sigma def standardNormCdf(x): return 0.5 + 0.5 * pm.math.erf(x / pm.math.sqrt(2)) def getContributionFromInterval(interval, mu, sigma): logT = log(interval) a = standardize(logT, mu, sigma) return log(1 - standardNormCdf(a)) def computeFailureRate(sigma, mu, t, theta, data): logT = log(t) a1 = standardize(log(t), mu, sigma) a2 = 0.5 * log(2 * pi * pow(sigma, 2)) B = 0.5 * pow(a1, 2) C = log(1 - standardNormCdf(a1)) D = (theta * data).sum(axis=-1).reshape((-1, 1)) failureRate = ((-1) * logT) - a2 - B - C + D return failureRate def getSurvival(time_to_event, mu, sigma, theta, features): contributionFromFinalInterval = (-1) * getContributionFromInterval(time_to_event, mu, sigma) theta_features_vector = (theta * features).sum(axis=-1) theta_features_vector = theta_features_vector.reshape((-1, 1)) survival = (contributionFromFinalInterval * theta_features_vector) survival = survival.sum(axis=1).reshape((-1, 1)) return survival theta_D = pm.Normal('theta_D', mu=0, sigma=100, shape=(1, features_shape)) theta_P = pm.Normal('theta_P', mu=0, sigma=100, shape=(1, features_shape)) mu_D = pm.Normal('mu_D', mu=0, sigma=10) mu_P = pm.Normal('mu_P', mu=0, sigma=10) sigma_D = pm.Exponential('sigma_D', .01) sigma_P = pm.Exponential('sigma_P', .01) def logp(time_to_event, mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, event, features): failureRate = where( pt.eq(event, 0), computeFailureRate(sigma_D, mu_D, time_to_event, theta_D, features), where(pt.eq(event, 1), computeFailureRate(sigma_P, mu_P, time_to_event, theta_P, features), 1) ) defaultSurvival = getSurvival(time_to_event, mu_D, sigma_D, theta_D, features) prepaymentSurival = getSurvival(time_to_event, mu_P, sigma_P, theta_P, features) return (failureRate - defaultSurvival - prepaymentSurival).flatten().sum() l = logp(time_to_event, mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, events, features_num) likelihood = pm.CustomDist('LL', mu_P, mu_D, sigma_P, sigma_D, theta_D, theta_P, events, features_num, logp=logp, observed=time_to_event) # trace = pm.sampling_jax.sample_numpyro_nuts(draws=1000,tune=1000, chains=4,chain_method='parallel') trace = pm.sample(draws=1000, tune=1000, chains=4) return trace
Something that I am failing to understand is what goes into the observed keyword? I feel I am messing things up there.
I would be really grateful if someone could shed some light on how I could achieve this in PYMC