Consider adding numerically stable `logccdf` (log survival function) to Exponential, Weibull and other distro

vybhav72954 · March 1, 2026, 6:05pm

Hi everyone,

While going through the codebase, I noticed that logccdf (the log complementary CDF, equivalent to the log survival function \log S(t) = \log(1 - F(t))) has a numerically stable registered implementation for only one distribution: Normal, via normal_lccdf in dist_math.py.

All other distributions fall back to the generic _logccdf_helper in logprob/abstract.py:

logccdf = log1mexp(logcdf)  # i.e. log(1 - exp(logcdf(t)))

This is numerically unstable when t is small. As t \to 0, \log \text{CDF}(t) \to -\infty, and computing \log(1 - \exp(-\infty)) involves catastrophic cancellation in the intermediate exp step.

Why this matters for survival analysis

The log-likelihood for a survival model with censored observations is:

\log L = \sum_{\text{uncensored}} \log f(t_i) + \sum_{\text{censored}} \log S(t_i)

The second term — the likelihood contribution of every right-censored observation — is exactly logccdf. Since pymc/logprob/censoring.py calls _logccdf_helper for the censored likelihood, any Bayesian survival model using Censored with these distributions currently uses the unstable fallback.

Proposed fix

Four distributions have closed-form log survival functions that are simpler and more stable than log1mexp(logcdf):

Distribution	\log S(t)
`Exponential(μ)`	-t/\mu
`Weibull(α, β)`	-(t/\beta)^\alpha
`LogNormal(μ, σ)`	`normal_lccdf(μ, σ, log(t))` — reuses existing `dist_math` function
`Gamma(α, scale)`	\log \Gamma_\text{upper}(\alpha,\, t/\text{scale}) via `pt.gammaincc`

I’ve verified all four against scipy.stats.*.logsf numerically. For Weibull in particular, the closed form is elegant: logcdf already computes -(t/\beta)^\alpha internally and wraps it in log1mexp — the logccdf is simply that exponent directly.

Please let me know if this understanding is correct, or if the current fallback behaviour is intentional for a reason I’ve missed. If the approach looks good, I’ll open an issue and submit a PR.

ricardoV94 · March 1, 2026, 10:12pm

Note that log1mexp is not just doing the naive log(1-exp(x)), it uses the implementation from https://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf

You may want to show actual evidence of substantial precision loss in those cases. The cases that are mathematically simpler are fine to implement directly though.

vybhav72954 · March 2, 2026, 12:09am

Thank you for the correcting me, i mus have missed that.

pt.log1mexp already implements the Mächler two-branch formula. I overstated the instability argument.

Taking your point directly: for Weibull, logcdf computes log1mexp(-(t/β)^α) and the fallback then applies log1mexp again, where the answer is simply -(t/β)^α. For LogNormal, normal_lccdf(μ, σ, log t) already exists in dist_math.py and is the direct computation, rather than going through the fallback composition.

Would it make sense to scope the PR to just Weibull and LogNormal on the basis of simplicity rather than precision? Looking forward to your inputs!
Thanks

vybhav72954 · March 2, 2026, 10:43am

HI @ricardoV94

While working on this issue, I was running the test suite locally on the v6 branch and noticed two tests are consistently failing on a clean checkout.

1. test_weibull_icdf Fails due to a microscopic precision mismatch at the 6th decimal:

ACTUAL: array(67.682719)
DESIRED: array(67.682715)

2. test_kumaraswamy Fails with a TypeError due to a complex number casting issue in PyTensor:

TypeError: float() argument must be a string or a real number, not 'complex'
Value triggering the error: x = (0.9995065603657316+0.03141075907812829j)

Environment:

OS: Windows
Python: 3.13.12
Branch: v6

Just wanted to flag these for visibility! While I’m working on the simpler_log Issue, do you want me to create a separate bug_report? I would be happy to push a patch for that as well.

Let me know your thoughts: should it be part of the same PR, or a different PR?

ricardoV94 · March 2, 2026, 10:56am

Pytensor rewrites log1mexp(log1mexp(x)) → x itself, so that’s not a problem

ricardoV94 · March 2, 2026, 10:57am

Who’s casting what to complex? That’s strange

vybhav72954 · March 2, 2026, 11:07am

Thanks for clarifying the log1mexp rewrite rule! I wasn’t fully aware that PyTensor’s graph optimizer caught that specific sequence to prevent the cancellation. That makes a lot of sense.

Even with the rewrite, would having explicit, closed-form logccdf registrations for distributions like Weibull and LogNormal still be preferred for clarity and minor performance gains, or should I hold off on opening a PR for this? Kindly let me know!

vybhav72954 · March 2, 2026, 11:12am

Regarding the complex casting issue, I ran into it locally while running the full test suite on a clean checkout of the v6 branch. Specifically, pytest tests/distributions/test_continuous.py -k "test_kumaraswamy" fails with this error:

TypeError: float() argument must be a string or a real number, not 'complex'

It seems to occur when evaluating value = -1.0 and a = 0.01 within the scipy_log_cdf fallback inside test_kumaraswamy

Not sure how you prefer to see the verbose logs, so here is the GitHub Gist, please correct me if there is a better means!

Link: PyMC Test Error Log · GitHub

vybhav72954 · March 2, 2026, 7:40pm

@ricardoV94 Will wait for your inputs on the following 2 questions before taking any next steps:

Since you mentioned log1mexp rewrite rule! Does the original intent of PR, that is, simpler and stable closed-form log functions for Exponential and Weibull, still make any sense, or not?
– Though there might be a small performance gain and it might be more explicit, I will rely on your expertise.
Regarding the failing test cases, I did a clean install and encountered them again. I have posted the Gist with the terminal output. I am happy to provide you with any further details. I have pinpointed the root cause already and can push a minimal patch if that helps.

Looking forward to hearing from you.

Thank you so much for your time and consideration.

I am a practicing researcher and stats student, and I really would like to contribute. If you have any other directions/issues where I can focus, please let me know. Happy to refactor, build and ship

ricardoV94 · March 3, 2026, 6:50pm

re: complex error. Can you run the involved scipy function directly and confirm you’re getting a complex output for non-complex inputs, and check if this is expected from the documentation of scipy?

For the logccdf methods, if the direct implementation is just what we do in logcdf minus the log1mexp wrapper, no need. Sounds from your description that is the case for all but lognormal? In that case open a PR for LogNormal, but also the others if it’s not just the log1mexp thing.

vybhav72954 · March 4, 2026, 12:01am

Hi @ricardoV94 - Ran the involved function directly. Here are the results:

Python’s ** operator returns complex:

   (-1.0) ** 0.01
   (0.9995065603657316+0.03141075907812829j)   # type: complex

np.power returns nan instead:

   >>> np.power(-1.0, 0.01)
   RuntimeWarning: invalid value encountered in power 
   nan

np.float64 ** also returns nan:

   >>> np.float64(-1.0) ** np.float64(0.01)
   RuntimeWarning: invalid value encountered in scalar power
   np.float64(nan)   # type: numpy.float64

TL;DR - So the behavior depends on which power implementation is used. Python’s native ** produces complex for fractional powers of negative floats, while numpy’s power returns nan.

Note that the reference function in test_kumaraswamy is not from scipy (Kumaraswamy isn’t in scipy). It is a hand-coded lambda that uses value**a, which hits Python’s native ** path.

Per the numpy.power documentation:

“Negative values raised to a non-integral value will result in nan
(and a warning will be generated).”

So np.power returning nan for (-1.0) ** 0.01 is documented and expected. The issue is that PyTensor’s scalar evaluation path appears to use Python’s native ** instead of numpy’s power, which produces complex rather than nan for the same inputs.

vybhav72954 · March 4, 2026, 12:03am

That’s correct.

LogNormal: logccdf is normal_lccdf(μ, σ, log(t)), which is a genuinely different function call, not just logcdf minus log1mexp. It reuses the existing normal_lccdf from dist_math.py, consistent with how Normal.logccdf already works.

I’ll open a PR scoped to LogNormal only.

Thank you for confirming!

ricardoV94 · March 4, 2026, 7:48am

The issue is that PyTensor’s scalar evaluation path appears to use Python’s native ** instead of numpy’s power, which produces complex rather than nan for the same inputs.

Sounds like something we need fixing in PyTensor then. If it didn’t expect complex it shouldn’t yield/fail with one

vybhav72954 · March 4, 2026, 7:55pm

Hi @ricardoV94, saw your comments on the PR. i will simplify it, i think i might have over-engineered. Meanwhile, i spent time on PyTensors, and i able to figure out the root cause

Traced this to pytensor/scalar/basic.py.

The Pow scalar op has an inconsistency between its tensor and scalar
evaluation paths:

class Pow(BinaryScalarOp):
    nfunc_spec = ("power", 2, 1)   # tensor path -> np.power -> returns nan

    def impl(self, x, y):
        return x**y                 # scalar path -> Python ** -> returns complex

For (-1.0) ** 0.01, the tensor path (via np.power) correctly returns nan, but the scalar path (via Python **) returns complex. The complex value then hits _cast_to_promised_scalar_dtype (line 1175) which fails trying to cast complex to float64

As I am still not completely versed with the entire code base of PyTensor, do let me know if I am looking at the wrong place!

ricardoV94 · March 5, 2026, 3:08pm

yes it looks like we should np.power there as well

vybhav72954 · March 5, 2026, 5:05pm

I’ll open an issue on the PyTensor repo with the diagnosis and
submit a PR for the fix! Let me know!

vybhav72954 · March 6, 2026, 9:11pm

@ricardoV94 - I have opened a PR here to resolve the issue in pytensor

Additionally, please let me know if you want me to make any updates in the pymc pull request.
I see there are some maintenance level checks that are failing, might need some guidance on how to fix that!

vybhav72954 · March 18, 2026, 4:59am

@ricardoV94 - Seeking your guidance here, although the changes have been approved from your end. The PR has not been merged yet, in case anything is missing on my end. Can you please guide me? If it’s about the tests that aren’t passing, those are the functions I didn’t touch at all, so I wasn’t sure whether I should correct them! Those test cases were failing even in a fresh environment.

Am happy to check them and take corrective measures. In case my understanding is incorrect, can you please guide me! Thanks

ricardoV94 · March 18, 2026, 1:14pm

It’s rerunning. There were some unrelated failures going on in the CI, depending on how fresh our fork was it may or not have still included them. I’ll review once the tests finish.

Topic		Replies	Views
Logcdf for T distribution throws an error Questions	2	531	November 5, 2021
Are there C implementations of ExGaussian and Normal logcdf functions? Questions	5	1230	June 2, 2021
AssertionError with logcdf method for a Beta Questions theano	10	1184	May 1, 2019
GSoC 2026 - Survival Models - Intent and Intro! Development	1	56	March 29, 2026
Simple Exponential Survival Funciton v5 modeling	3	430	February 3, 2023

Consider adding numerically stable `logccdf` (log survival function) to Exponential, Weibull and other distro

Related topics