GSoC 2026 Actuarial Survival Models: Tackling Left-Truncation & Declarative API Vision

Hi PyMC Team and Mentors,

My name is Shuo Zhang, an MSc grad in Financial & Insurance Mathematics at Charles University and an Actuary Candidate (passed SOA exams P & FM). I am incredibly excited about the Survival Models project for GSoC 2026.

I’ve been closely following the community’s recent explorations around right-censoring (like the excellent customer churn notebook) and the deep discussions on the numerical stability of logccdf for censored distributions. Building on this foundation, I would love to bring a rigorous actuarial perspective to the table.

In Life/Health Insurance pricing, the true mathematical complexity often arises when dealing with Left-Truncation (e.g., policyholders entering a mortality study conditionally at age 40). While right-censoring is elegantly handled by pm.Censored, left-truncation currently requires actuaries to manually construct complex, numerically sensitive tensor graphs using pm.Potential.

To demonstrate this, I have developed a comprehensive Jupyter Notebook that tackles a Left-Truncated Accelerated Failure Time (AFT) Weibull model: [ GitHub - ZHANGSHUO22/PyMC-Actuarial-Survival-GSoC26: Demonstrating Left-Truncated AFT Survival Models and Declarative API vision for PyMC. · GitHub ]

In the notebook, I simulate a synthetic life insurance cohort and manually inject the conditional log-likelihoods. To ensure absolute numerical stability (avoiding catastrophic cancellation or log(0) warnings), I bypassed generic fallbacks and implemented the exact closed-form solution for the Weibull log-survival function, combined with a log-link GLM architecture. The NUTS sampler elegantly recovers the true baseline hazards and covariate effects.

The GSoC 2026 Vision: While the manual tensor implementation is mathematically sound, it is highly complex for typical end-users. My proposal aims to abstract these operations into a declarative Formula API (analogous to Bambi/CausalPy). The vision(see in the link above) is to allow actuaries and data scientists to specify complex truncations effortlessly.

I am currently finalizing my formal PDF proposal based on this architectural blueprint. I would highly appreciate any early feedback or thoughts from the core developers on this formula-based approach to truncation!

Looking forward to contributing to the PyMC ecosystem!