Censoring a truncated distribution

alaz · November 9, 2023, 8:39am

Hello,

I’m trying to implement a right-censored model. However, I need the distribution which will be censored to be a normal distribution left-truncated at zero. I’m stuck with the following error.

NotImplementedError: LogCDF method not implemented for truncated_normal_rv{0, (0, 0, 0, 0), floatX, False}

What I understand is that there’s no logcdf defined for the truncated normal, but I saw a merged pull request here, and it seems that logcdf is in fact implemented?

Censoring a truncated distribution

Following is a prior predictive illustration:

import pymc as pm
import arviz as az
import numpy as np
from copy import copy

with pm.Model() as m:
    a_normal = pm.Normal("n", 1, 1)
    normal_dist = pm.Normal.dist(mu=1.0, sigma=1.0)
    truncated_normal = pm.Truncated("truncated_normal", normal_dist, lower=0)
    truncated_normal_dist = pm.Truncated.dist(normal_dist, lower=0)
    cs_normal = pm.Censored("cs_n", normal_dist, lower=0, upper=3)
    cs_truncated_normal = pm.Censored("cs_tn", truncated_normal_dist, lower=None, upper=3)

with m:
    idata = pm.sample_prior_predictive(samples=10000)

az.plot_dist(idata.prior.stack(samples=("chain", "draw")).n, color='k', label="the normal")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).truncated_normal, color='b', label="lower-truncated at 0")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).cs_tn, color='r', label="truncated-censored")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).cs_n, color='g', label="non-truncated-censored")

Which will output:

So, I can’t use a normal distribution (black) and censor it from both sides (green), what I need is a truncated distribution (blue) to be right-censored (red).

Generate Data

N = 1  # one trial is enough to illustrate the problem
mu, sigma = 1, 2
upper_bounds = np.random.randint(0, 3, N)
y = np.random.normal(loc=mu, scale=sigma, size=N)

def right_censor(y, upper_bounds):
    cy = copy(y)
    censor_index = cy >= upper_bounds
    cy[censor_index] = upper_bounds[censor_index]
    return cy

cy = right_censor(y, upper_bounds)

print(y, upper_bounds, cy)

Inference model

with pm.Model() as m1:
    mu = pm.Normal("mu", 0, 1)
    sigma = pm.Normal("sigma", 0, 1)
    normal_ = pm.Normal.dist(mu, sigma)
    trunc_normal = pm.Truncated.dist(normal_, lower=0)
    censored = pm.Censored("censored", trunc_normal, lower=None, upper=upper_bounds, observed=cy)

Error

Following gives the error.

with m1:
    idata_m1 = pm.sample()

NotImplementedError: LogCDF method not implemented for truncated_normal_rv{0, (0, 0, 0, 0), floatX, False}

The watermark:

Last updated: Thu Nov 09 2023

Python implementation: CPython
Python version : 3.11.6
IPython version : 8.16.1

scipy : 1.11.3
arviz : 0.16.1
pytensor : 2.13.1
pymc : 5.9.0
numpy : 1.25.2
matplotlib: 3.8.0
pandas : 2.1.1

Watermark: 2.4.3

Question

So I want to ask:

Is this the correct approach? Is this error really because the logcdf is not implemented, or is there a user error in model creation part?
What may be a workaround if it’s not implemented yet? What would you suggest?

ricardoV94 · November 9, 2023, 10:30am

TruncatedNormal is a specialized distribution. The PR you saw implemented logcdf for all the other non-specialized cases, but leaves TruncatedNormal still without a logcdf. I’ll open an issue to add it.

github.com/pymc-devs/pymc

Implement logcdf for TruncatedNormal

opened 10:33AM - 09 Nov 23 UTC

ricardoV94

enhancements beginner friendly

### Description We implemented the logcdf for the general Truncated cases in …https://github.com/pymc-devs/pymc/pull/6690, but TruncatedNormal is its own distribution and doesn't have a logcdf currently implemented. We should refactor the logcdf function added in that PR so that we can call it from the logcdf of TruncatedNormal as well as the general Truncated (no point in duplicating the code) Reported in https://discourse.pymc.io/t/censoring-a-truncated-distribution/13261

ricardoV94 · November 9, 2023, 12:38pm

As silly as it may be that means you can use a StudentT with large fixed nu, to get what you want. That will behaves basically like a Normal, except the likelihood could be a bit slower.

alaz · November 21, 2023, 9:47am

Thank you very much for your answers Ricardo. I tried your solution and it works. As I try to replicate the prior part with pm.StudentT I got another error.

with pm.Model() as m2:
    n = pm.Normal("n", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t = pm.Truncated("sT_t", sT_d, lower=2)
    sT_c = pm.Censored("sT_c", sT_d, lower=2, upper=5)
    sT_t_d = pm.Truncated.dist(sT_d, lower=2, shape=1)
    # st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5) # this line gives truncation error
    idata_m2 = pm.sample_prior_predictive(samples=15_000)

This code block above gives following errors:

ValueError: Nonzero only supports non-scalar arrays. → If pm.StudentT.dist has no shape argument.
TruncationError: Truncation did not converge in 10000 steps → when I use that second censored distribution (st_t_c)

But it works when it’s in another model:

with pm.Model() as m2_1:
    # repeat as in m2
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t_d = pm.Truncated.dist(sT_d, lower=2)
    st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5)
    idata_m2_1 = pm.sample_prior_predictive(samples=15_000)

and plotting:

az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).n, color='m', label="n")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT, color='k', label="sT")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT_t, color='b', label="truncated-sT")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT_c, color='g', label="censored-sT")
az.plot_dist(idata_m2_1.prior.stack(samples=("chain", "draw")).sT_t_c, color='r', label="censored-truncated-sT");

returns:

The inference part works fine:

with pm.Model() as m3:
    sigma = pm.Exponential("sigma", 1)
    mu = pm.Normal("mu", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=mu, sigma=sigma, shape=2)
    sT_t = pm.Truncated.dist(sT_d, lower=0)
    censored = pm.Censored("censored", sT_t, lower=None, upper=[3, 5], observed=[1, 3])
    idata_m3 = pm.sample()

Now my problem is solved, but I wanted to share those errors above, I could not make sense of them. Trying with another continuous distribution (VonMises) also returned the same error. Pasting the code below:

with pm.Model() as m4:
    vm = pm.VonMises("vm", mu=1, kappa=100)
    vm_dist = pm.VonMises.dist(mu=1, kappa=100, shape=1) # warning w/o shape argument ?
    vm_truncated = pm.Truncated("vm_truncated", vm_dist, lower=0.9)
    vm_truncated_dist = pm.Truncated.dist(vm_dist, lower=0.9)
    cs_vm = pm.Censored("cs_vm", vm_dist, lower=0.9, upper=1.2)
    # cs_vm_t = pm.Censored("cs_vm_t", vm_truncated_dist, lower=0, upper=3) # this gives error
    idata_m4 = pm.sample_prior_predictive(samples=10_000)
with pm.Model() as m5:
    vm = pm.VonMises("vm", mu=1, kappa=100)
    vm_dist = pm.VonMises.dist(mu=1, kappa=100, shape=1) # warning w/o shape argument ?
    vm_truncated_dist = pm.Truncated.dist(vm_dist, lower=0.9)
    cs_vm_t = pm.Censored("cs_vm_t", vm_truncated_dist, lower=0, upper=1.2)
    idata_m5 = pm.sample_prior_predictive(samples=10_000)
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).vm, color='k', label="vm")
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).vm_truncated, color='b', label="truncated-vm")
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).cs_vm, color='r', label="non-truncated-censored-vm")
az.plot_dist(idata_m5.prior.stack(samples=("chain", "draw")).cs_vm_t, color='g', label="truncated-censored-sT")

ricardoV94 · November 21, 2023, 10:01am

Can you share a fully reproducible example with the StudentT? Also, what version of PyMC are you using? Using the latest would be the best

alaz · November 21, 2023, 12:17pm

I think running m2 without commenting that line should produce the error. I was running on pymc 5.9.0, now updated to 5.9.2 but it persists. Following block should be fully reproducible:

import pymc as pm
with pm.Model() as m2:
    n = pm.Normal("n", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t = pm.Truncated("sT_t", sT_d, lower=2)
    sT_c = pm.Censored("sT_c", sT_d, lower=2, upper=5)
    sT_t_d = pm.Truncated.dist(sT_d, lower=2, shape=1)
    st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5) # this line gives truncation error
    idata_m2 = pm.sample_prior_predictive(samples=15_000)

Python implementation: CPython
Python version : 3.11.6
IPython version : 8.17.2

numpy: 1.25.2
pymc : 5.9.2
arviz: 0.16.1

Watermark: 2.4.3

ricardoV94 · November 22, 2023, 8:42am

Thanks @alaz I could reproduce the problem with a simpler scenario. It has nothing to do with Censored but with identical Truncated distributions in the same model. You should see the problem go away if you change one of the parameters to be slightly different, like nu=100 in one of the truncated and nu=101 in the other.

I opened an issue in the PyMC repo: Conflicts between multiple identical Truncated distributions · Issue #7024 · pymc-devs/pymc · GitHub

alaz · November 22, 2023, 8:56am

Thank you very much!

Topic		Replies	Views
Truncated multivariate normal likelihood v5 modeling	10	77	May 30, 2025
Truncated normal distribution with optimal truncation thresholds via DensityDist Questions	8	3017	June 21, 2018
Truncated Inverse normal distribution (also known as Wald distribution) Questions	15	2903	April 27, 2018
Truncated observed data? Questions	6	2308	November 6, 2020
Issues with Truncated Normal version agnostic modeling	8	366	January 4, 2024

Censoring a truncated distribution

Censoring a truncated distribution

Generate Data

Inference model

Error

Question

Related topics