The models defined above will yield (3770, 3770)
for reg[:, group_idx].eval().shape
and yield (3770, 5)
for aft_model["reg"].eval().shape
. I don’t understand why they produce different results. Again (3770, 5)
makes sense to me, so it seems reg[:, group_idx].eval().shape
is not a helpful evaluation.
I’m happy to say that I figured out how to fix the latest error. I verified that the model is now able to do all forms of prior & posterior MCMC sampling. The fix came by splitting the groups_flat
dimension into two dimensions named groups_cens_flat
& groups_uncens_flat
.
Here’s the final model:
intervals = np.arange(12)
n_groups = 5
unique_groups = list(range(n_groups))
group_values = np.random.randint(n_groups, size=retention_df.shape[0])
group_idx = pd.Categorical(group_values, categories=unique_groups).codes
cens = [i for i, x in enumerate(retention_df.left) if x == 0.0]
uncens = [i for i, x in enumerate(retention_df.left) if x == 1.0]
coords = {
"intervals": intervals,
"groups": unique_groups,
"groups_cens_flat": group_values[cens],
"groups_uncens_flat": group_values[uncens],
"preds": [
"sentiment",
"intention",
"Male",
"Low",
"Medium",
"Finance",
"Health",
"Law",
"Public/Government",
"Sales/Marketing",
],
}
X = retention_df[
[
"sentiment",
"intention",
"Male",
"Low",
"Medium",
"Finance",
"Health",
"Law",
"Public/Government",
"Sales/Marketing",
]
].copy()
y = retention_df["month"].values
def weibull_lccdf(x, alpha, beta):
"""Log complementary cdf of Weibull distribution."""
return -((x / beta) ** alpha)
with pm.Model(coords=coords, check_bounds=False) as aft_model:
X_data = pm.MutableData("X_data_obs", X)
beta = pm.Normal("beta", 0.0, 1, dims="preds")
mu = pm.Normal("mu", 0, 1, dims="groups")
s = pm.HalfNormal("s", 5.0, dims="groups")
eta = pm.Deterministic("eta", pm.math.dot(beta, X_data.T))
reg = pm.Deterministic("reg", pt.exp(-(mu[None, :] + eta[:, None]) / s[None, :]))
y_obs = pm.Weibull(
"y_obs",
beta=reg[uncens, group_idx[uncens]],
alpha=s[group_idx[uncens]],
observed=y[uncens],
dims="groups_uncens_flat"
)
y_cens = pm.Potential(
"y_cens",
weibull_lccdf(y[cens], alpha=s[group_idx[cens]], beta=reg[cens, group_idx[cens]]),
dims="groups_cens_flat"
)
idata = pm.sample_prior_predictive()
idata.extend(pm.sample())
idata.extend(pm.sample_posterior_predictive(idata))
```