Unpooled Accelerated Failure Time Model

buckeye17 · November 23, 2024, 4:58am

Well, I spoke too soon. I assumed that if the sampler started then the model was valid. It turns out that the above model only produces an error once the sampler finishes running. Here’s the error message:

ValueError: conflicting sizes for dimension 'groups_flat': length 1354 on the data but length 3770 on coordinate 'groups_flat'

ricardoV94 · November 24, 2024, 3:27pm

instead of doing shape.eval(), do eval().shape. The first can hide shape errors, the second won’t

buckeye17 · November 25, 2024, 4:07pm

The models defined above will yield (3770, 3770) for reg[:, group_idx].eval().shape and yield (3770, 5) for aft_model["reg"].eval().shape. I don’t understand why they produce different results. Again (3770, 5) makes sense to me, so it seems reg[:, group_idx].eval().shape is not a helpful evaluation.

I’m happy to say that I figured out how to fix the latest error. I verified that the model is now able to do all forms of prior & posterior MCMC sampling. The fix came by splitting the groups_flat dimension into two dimensions named groups_cens_flat & groups_uncens_flat.

Here’s the final model:

intervals = np.arange(12)
n_groups = 5
unique_groups = list(range(n_groups))
group_values = np.random.randint(n_groups, size=retention_df.shape[0])
group_idx = pd.Categorical(group_values, categories=unique_groups).codes
cens = [i for i, x in enumerate(retention_df.left) if x == 0.0]
uncens = [i for i, x in enumerate(retention_df.left) if x == 1.0]
coords = {
    "intervals": intervals,
    "groups": unique_groups,
    "groups_cens_flat": group_values[cens],
    "groups_uncens_flat": group_values[uncens],
    "preds": [
        "sentiment",
        "intention",
        "Male",
        "Low",
        "Medium",
        "Finance",
        "Health",
        "Law",
        "Public/Government",
        "Sales/Marketing",
    ],
}

X = retention_df[
    [
        "sentiment",
        "intention",
        "Male",
        "Low",
        "Medium",
        "Finance",
        "Health",
        "Law",
        "Public/Government",
        "Sales/Marketing",
    ]
].copy()
y = retention_df["month"].values


def weibull_lccdf(x, alpha, beta):
    """Log complementary cdf of Weibull distribution."""
    return -((x / beta) ** alpha)

with pm.Model(coords=coords, check_bounds=False) as aft_model:
    X_data = pm.MutableData("X_data_obs", X)
    beta = pm.Normal("beta", 0.0, 1, dims="preds")
    mu = pm.Normal("mu", 0, 1, dims="groups")
    s = pm.HalfNormal("s", 5.0, dims="groups")
    eta = pm.Deterministic("eta", pm.math.dot(beta, X_data.T))
    reg = pm.Deterministic("reg", pt.exp(-(mu[None, :] + eta[:, None]) / s[None, :]))
    y_obs = pm.Weibull(
        "y_obs",
        beta=reg[uncens, group_idx[uncens]],
        alpha=s[group_idx[uncens]],
        observed=y[uncens],
        dims="groups_uncens_flat"
    )
    y_cens = pm.Potential(
        "y_cens",
        weibull_lccdf(y[cens], alpha=s[group_idx[cens]], beta=reg[cens, group_idx[cens]]),
        dims="groups_cens_flat"
    )
    
    idata = pm.sample_prior_predictive()
    idata.extend(pm.sample())
    idata.extend(pm.sample_posterior_predictive(idata))
    ```

ricardoV94 · November 25, 2024, 4:53pm

PyTensor assumes you have a valid shape graph, and gives you an output based on that assumption. In you case you had something invalid.

https://pytensor.readthedocs.io/en/latest/tutorial/shape_info.html#problems-with-shape-inference

Topic		Replies	Views
Weibull Survival Regression AFT Questions	4	1552	April 20, 2018
(newbie) Random censoring using pymc.Censored? v5 modeling	0	330	January 14, 2023
LOO/WAIC for censored survival data Questions	1	488	January 13, 2022
Censored Weibull	5	177	April 4, 2024
Defining an unpooled mixture model version agnostic modeling	7	69	July 24, 2024

Unpooled Accelerated Failure Time Model

Related topics