Simple Exponential Survival Funciton

I am trying to model the same problem as this question (I am translating the examples in the 2023 rethinking course here however I am having a problem with my implementation and I cannot completely follow the solution provided in the above (especially after the ==>).

My solution:

def log_ccdf_exponential(x, lam):
    return -pm.math.log(1 - pm.math.exp(-lam * x))

with pm.Model() as cat_censored_m:
    days_to_event = np.array(cats['days_to_event'])
    adopted = np.array([0 if x == 'Adoption' else 1 for x in cats['out_event']])
    black = np.array([1 if x == 'Black' else 0 for x in cats['color']])
    
    a = pm.Normal('a', 0, 1,shape=2)
    log_rate = a[black]
    y_cens = pm.Potential('y_cens', log_ccdf_exponential(days_to_event[adopted==1],lam=log_rate[adopted==1]))
    y_obs  = pm.Exponential('y_obs',lam=log_rate, observed=days_to_event[adopted==0])
   
    cat_trace = pm.sample() 

This brings up a broadcasting error. Input dimension mismatch. One other input has shape[0] = 22356, but input[2].shape[0] = 11351.

22356 is the length of the entire dataset. 11351 is the length of days_to_event[adopted==0].

Thanks in advance for any help.

This line, it looks like forgot to index the relevant log_rate

Should be

y_obs  = pm.Exponential('y_obs',lam=log_rate[adopted==0], observed=days_to_event[adopted==0])

Unrelated to the shape issues, you could also use pm.Censored https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.Censored.html

with pm.Model() as cat_censored_m:
    days_to_event = np.array(cats['days_to_event'])
    adopted = np.array([0 if x == 'Adoption' else 1 for x in cats['out_event']])
    black = np.array([1 if x == 'Black' else 0 for x in cats['color']])
    
    a = pm.Normal('a', 0, 1,shape=2)
    log_rate = a[black]

    upper_censoring = days_to_event.copy()
    upper_censoring[adopted==0] = np.inf  

    y_obs = pm.Censored(
        'y_obs', 
        pm.Exponential.dist(lam=log_rate),
        upper=upper_censoring, 
        observed=days_to_event,
    )
   
    cat_trace = pm.sample() 
1 Like

Thank you for the response!

Thanks for the pm.Censored(() example, my other solution appears to hang whilst sampling.

I currently have the following solution which provides ‘reasonable’ numbers for a.

with pm.Model() as cat_censored_m:
    days_to_event = np.array(cats['days_to_event'],dtype=float)
    adopted = np.array([1 if x == 'Adoption' else 0 for x in cats['out_event']])
    black = np.array([0 if x == 'Black' else 1 for x in cats['color']])
    
    a = pm.Normal('a', 0, 1,shape=2)
    log_rate = 1/a[black]

    upper_censoring = days_to_event.copy()
    upper_censoring[adopted==0] = np.inf  

    y_obs = pm.Censored(
        'y_obs', 
        pm.Exponential.dist(lam=log_rate),
        lower = None,
        upper=upper_censoring, 
        observed=days_to_event,
    )
   
    cat_censored_trace = pm.sample(initvals={'a':np.array([10,10])})

I had to set the initvals otherwise it would fail when trying to sample below 0.

Unfortunately this appears to be giving me an erroneous result. I expect the Black cats to have a higher, more dispersed a distribution. As shown in the lecture here. However I obtain the following distribution (a[0] == black cats). Any insight as to what’s going on?

Maybe the encodings got mixed (or which category is censored and which is not), or the lecture is actually modeling in terms of rate directly and not mean (that is a and not 1/a). Just trying to guess

About the initvals, why not use a lognormal prior or something that has only positive support instead of a normal prior?