I am trying to model the same problem as this question (I am translating the examples in the 2023 rethinking course here however I am having a problem with my implementation and I cannot completely follow the solution provided in the above (especially after the ==>).
My solution:
def log_ccdf_exponential(x, lam):
return -pm.math.log(1 - pm.math.exp(-lam * x))
with pm.Model() as cat_censored_m:
days_to_event = np.array(cats['days_to_event'])
adopted = np.array([0 if x == 'Adoption' else 1 for x in cats['out_event']])
black = np.array([1 if x == 'Black' else 0 for x in cats['color']])
a = pm.Normal('a', 0, 1,shape=2)
log_rate = a[black]
y_cens = pm.Potential('y_cens', log_ccdf_exponential(days_to_event[adopted==1],lam=log_rate[adopted==1]))
y_obs = pm.Exponential('y_obs',lam=log_rate, observed=days_to_event[adopted==0])
cat_trace = pm.sample()
This brings up a broadcasting error. Input dimension mismatch. One other input has shape[0] = 22356, but input[2].shape[0] = 11351.
22356 is the length of the entire dataset. 11351 is the length of days_to_event[adopted==0].
with pm.Model() as cat_censored_m:
days_to_event = np.array(cats['days_to_event'])
adopted = np.array([0 if x == 'Adoption' else 1 for x in cats['out_event']])
black = np.array([1 if x == 'Black' else 0 for x in cats['color']])
a = pm.Normal('a', 0, 1,shape=2)
log_rate = a[black]
upper_censoring = days_to_event.copy()
upper_censoring[adopted==0] = np.inf
y_obs = pm.Censored(
'y_obs',
pm.Exponential.dist(lam=log_rate),
upper=upper_censoring,
observed=days_to_event,
)
cat_trace = pm.sample()
Thanks for the pm.Censored(() example, my other solution appears to hang whilst sampling.
I currently have the following solution which provides ‘reasonable’ numbers for a.
with pm.Model() as cat_censored_m:
days_to_event = np.array(cats['days_to_event'],dtype=float)
adopted = np.array([1 if x == 'Adoption' else 0 for x in cats['out_event']])
black = np.array([0 if x == 'Black' else 1 for x in cats['color']])
a = pm.Normal('a', 0, 1,shape=2)
log_rate = 1/a[black]
upper_censoring = days_to_event.copy()
upper_censoring[adopted==0] = np.inf
y_obs = pm.Censored(
'y_obs',
pm.Exponential.dist(lam=log_rate),
lower = None,
upper=upper_censoring,
observed=days_to_event,
)
cat_censored_trace = pm.sample(initvals={'a':np.array([10,10])})
I had to set the initvals otherwise it would fail when trying to sample below 0.
Unfortunately this appears to be giving me an erroneous result. I expect the Black cats to have a higher, more dispersed a distribution. As shown in the lecture here. However I obtain the following distribution (a[0] == black cats). Any insight as to what’s going on?
Maybe the encodings got mixed (or which category is censored and which is not), or the lecture is actually modeling in terms of rate directly and not mean (that is a and not 1/a). Just trying to guess
About the initvals, why not use a lognormal prior or something that has only positive support instead of a normal prior?