Unpooled Accelerated Failure Time Model

Well, I spoke too soon. I assumed that if the sampler started then the model was valid. It turns out that the above model only produces an error once the sampler finishes running. Here’s the error message:

ValueError: conflicting sizes for dimension 'groups_flat': length 1354 on the data but length 3770 on coordinate 'groups_flat'

instead of doing shape.eval(), do eval().shape. The first can hide shape errors, the second won’t

The models defined above will yield (3770, 3770) for reg[:, group_idx].eval().shape and yield (3770, 5) for aft_model["reg"].eval().shape. I don’t understand why they produce different results. Again (3770, 5) makes sense to me, so it seems reg[:, group_idx].eval().shape is not a helpful evaluation.

I’m happy to say that I figured out how to fix the latest error. I verified that the model is now able to do all forms of prior & posterior MCMC sampling. The fix came by splitting the groups_flat dimension into two dimensions named groups_cens_flat & groups_uncens_flat.

Here’s the final model:

intervals = np.arange(12)
n_groups = 5
unique_groups = list(range(n_groups))
group_values = np.random.randint(n_groups, size=retention_df.shape[0])
group_idx = pd.Categorical(group_values, categories=unique_groups).codes
cens = [i for i, x in enumerate(retention_df.left) if x == 0.0]
uncens = [i for i, x in enumerate(retention_df.left) if x == 1.0]
coords = {
    "intervals": intervals,
    "groups": unique_groups,
    "groups_cens_flat": group_values[cens],
    "groups_uncens_flat": group_values[uncens],
    "preds": [
        "sentiment",
        "intention",
        "Male",
        "Low",
        "Medium",
        "Finance",
        "Health",
        "Law",
        "Public/Government",
        "Sales/Marketing",
    ],
}

X = retention_df[
    [
        "sentiment",
        "intention",
        "Male",
        "Low",
        "Medium",
        "Finance",
        "Health",
        "Law",
        "Public/Government",
        "Sales/Marketing",
    ]
].copy()
y = retention_df["month"].values


def weibull_lccdf(x, alpha, beta):
    """Log complementary cdf of Weibull distribution."""
    return -((x / beta) ** alpha)

with pm.Model(coords=coords, check_bounds=False) as aft_model:
    X_data = pm.MutableData("X_data_obs", X)
    beta = pm.Normal("beta", 0.0, 1, dims="preds")
    mu = pm.Normal("mu", 0, 1, dims="groups")
    s = pm.HalfNormal("s", 5.0, dims="groups")
    eta = pm.Deterministic("eta", pm.math.dot(beta, X_data.T))
    reg = pm.Deterministic("reg", pt.exp(-(mu[None, :] + eta[:, None]) / s[None, :]))
    y_obs = pm.Weibull(
        "y_obs",
        beta=reg[uncens, group_idx[uncens]],
        alpha=s[group_idx[uncens]],
        observed=y[uncens],
        dims="groups_uncens_flat"
    )
    y_cens = pm.Potential(
        "y_cens",
        weibull_lccdf(y[cens], alpha=s[group_idx[cens]], beta=reg[cens, group_idx[cens]]),
        dims="groups_cens_flat"
    )
    
    idata = pm.sample_prior_predictive()
    idata.extend(pm.sample())
    idata.extend(pm.sample_posterior_predictive(idata))
    ```

PyTensor assumes you have a valid shape graph, and gives you an output based on that assumption. In you case you had something invalid.

https://pytensor.readthedocs.io/en/latest/tutorial/shape_info.html#problems-with-shape-inference