Censoring a truncated distribution

Hello,

I’m trying to implement a right-censored model. However, I need the distribution which will be censored to be a normal distribution left-truncated at zero. I’m stuck with the following error.

NotImplementedError: LogCDF method not implemented for truncated_normal_rv{0, (0, 0, 0, 0), floatX, False}

What I understand is that there’s no logcdf defined for the truncated normal, but I saw a merged pull request here, and it seems that logcdf is in fact implemented?

Censoring a truncated distribution

Following is a prior predictive illustration:

import pymc as pm
import arviz as az
import numpy as np
from copy import copy
with pm.Model() as m:
    a_normal = pm.Normal("n", 1, 1)
    normal_dist = pm.Normal.dist(mu=1.0, sigma=1.0)
    truncated_normal = pm.Truncated("truncated_normal", normal_dist, lower=0)
    truncated_normal_dist = pm.Truncated.dist(normal_dist, lower=0)
    cs_normal = pm.Censored("cs_n", normal_dist, lower=0, upper=3)
    cs_truncated_normal = pm.Censored("cs_tn", truncated_normal_dist, lower=None, upper=3)

with m:
    idata = pm.sample_prior_predictive(samples=10000)

az.plot_dist(idata.prior.stack(samples=("chain", "draw")).n, color='k', label="the normal")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).truncated_normal, color='b', label="lower-truncated at 0")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).cs_tn, color='r', label="truncated-censored")
az.plot_dist(idata.prior.stack(samples=("chain", "draw")).cs_n, color='g', label="non-truncated-censored")

Which will output:

So, I can’t use a normal distribution (black) and censor it from both sides (green), what I need is a truncated distribution (blue) to be right-censored (red).

Generate Data

N = 1  # one trial is enough to illustrate the problem
mu, sigma = 1, 2
upper_bounds = np.random.randint(0, 3, N)
y = np.random.normal(loc=mu, scale=sigma, size=N)

def right_censor(y, upper_bounds):
    cy = copy(y)
    censor_index = cy >= upper_bounds
    cy[censor_index] = upper_bounds[censor_index]
    return cy

cy = right_censor(y, upper_bounds)

print(y, upper_bounds, cy)

Inference model

with pm.Model() as m1:
    mu = pm.Normal("mu", 0, 1)
    sigma = pm.Normal("sigma", 0, 1)
    normal_ = pm.Normal.dist(mu, sigma)
    trunc_normal = pm.Truncated.dist(normal_, lower=0)
    censored = pm.Censored("censored", trunc_normal, lower=None, upper=upper_bounds, observed=cy)

Error

Following gives the error.

with m1:
    idata_m1 = pm.sample()
NotImplementedError: LogCDF method not implemented for truncated_normal_rv{0, (0, 0, 0, 0), floatX, False}
  • The watermark:

Last updated: Thu Nov 09 2023

Python implementation: CPython
Python version : 3.11.6
IPython version : 8.16.1

scipy : 1.11.3
arviz : 0.16.1
pytensor : 2.13.1
pymc : 5.9.0
numpy : 1.25.2
matplotlib: 3.8.0
pandas : 2.1.1

Watermark: 2.4.3

Question

So I want to ask:

  • Is this the correct approach? Is this error really because the logcdf is not implemented, or is there a user error in model creation part?
  • What may be a workaround if it’s not implemented yet? What would you suggest?

TruncatedNormal is a specialized distribution. The PR you saw implemented logcdf for all the other non-specialized cases, but leaves TruncatedNormal still without a logcdf. I’ll open an issue to add it.

1 Like

As silly as it may be that means you can use a StudentT with large fixed nu, to get what you want. That will behaves basically like a Normal, except the likelihood could be a bit slower.

Thank you very much for your answers Ricardo. I tried your solution and it works. As I try to replicate the prior part with pm.StudentT I got another error.

with pm.Model() as m2:
    n = pm.Normal("n", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t = pm.Truncated("sT_t", sT_d, lower=2)
    sT_c = pm.Censored("sT_c", sT_d, lower=2, upper=5)
    sT_t_d = pm.Truncated.dist(sT_d, lower=2, shape=1)
    # st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5) # this line gives truncation error
    idata_m2 = pm.sample_prior_predictive(samples=15_000)

This code block above gives following errors:

  • ValueError: Nonzero only supports non-scalar arrays. → If pm.StudentT.dist has no shape argument.
  • TruncationError: Truncation did not converge in 10000 steps → when I use that second censored distribution (st_t_c)

But it works when it’s in another model:

with pm.Model() as m2_1:
    # repeat as in m2
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t_d = pm.Truncated.dist(sT_d, lower=2)
    st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5)
    idata_m2_1 = pm.sample_prior_predictive(samples=15_000)

and plotting:

az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).n, color='m', label="n")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT, color='k', label="sT")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT_t, color='b', label="truncated-sT")
az.plot_dist(idata_m2.prior.stack(samples=("chain", "draw")).sT_c, color='g', label="censored-sT")
az.plot_dist(idata_m2_1.prior.stack(samples=("chain", "draw")).sT_t_c, color='r', label="censored-truncated-sT");

returns:
image

The inference part works fine:

with pm.Model() as m3:
    sigma = pm.Exponential("sigma", 1)
    mu = pm.Normal("mu", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=mu, sigma=sigma, shape=2)
    sT_t = pm.Truncated.dist(sT_d, lower=0)
    censored = pm.Censored("censored", sT_t, lower=None, upper=[3, 5], observed=[1, 3])
    idata_m3 = pm.sample()

Now my problem is solved, but I wanted to share those errors above, I could not make sense of them. Trying with another continuous distribution (VonMises) also returned the same error. Pasting the code below:

with pm.Model() as m4:
    vm = pm.VonMises("vm", mu=1, kappa=100)
    vm_dist = pm.VonMises.dist(mu=1, kappa=100, shape=1) # warning w/o shape argument ?
    vm_truncated = pm.Truncated("vm_truncated", vm_dist, lower=0.9)
    vm_truncated_dist = pm.Truncated.dist(vm_dist, lower=0.9)
    cs_vm = pm.Censored("cs_vm", vm_dist, lower=0.9, upper=1.2)
    # cs_vm_t = pm.Censored("cs_vm_t", vm_truncated_dist, lower=0, upper=3) # this gives error
    idata_m4 = pm.sample_prior_predictive(samples=10_000)
with pm.Model() as m5:
    vm = pm.VonMises("vm", mu=1, kappa=100)
    vm_dist = pm.VonMises.dist(mu=1, kappa=100, shape=1) # warning w/o shape argument ?
    vm_truncated_dist = pm.Truncated.dist(vm_dist, lower=0.9)
    cs_vm_t = pm.Censored("cs_vm_t", vm_truncated_dist, lower=0, upper=1.2)
    idata_m5 = pm.sample_prior_predictive(samples=10_000)
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).vm, color='k', label="vm")
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).vm_truncated, color='b', label="truncated-vm")
az.plot_dist(idata_m4.prior.stack(samples=("chain", "draw")).cs_vm, color='r', label="non-truncated-censored-vm")
az.plot_dist(idata_m5.prior.stack(samples=("chain", "draw")).cs_vm_t, color='g', label="truncated-censored-sT")

Can you share a fully reproducible example with the StudentT? Also, what version of PyMC are you using? Using the latest would be the best

I think running m2 without commenting that line should produce the error. I was running on pymc 5.9.0, now updated to 5.9.2 but it persists. Following block should be fully reproducible:

import pymc as pm
with pm.Model() as m2:
    n = pm.Normal("n", 3, 1)
    sT_nu_fixed = pm.ConstantData("sT_nu_fixed", 100)
    sT = pm.StudentT("sT", nu=sT_nu_fixed, mu=3, sigma=1)
    sT_d = pm.StudentT.dist(nu=sT_nu_fixed, mu=3, sigma=1, shape=1) # error without shape argument
    sT_t = pm.Truncated("sT_t", sT_d, lower=2)
    sT_c = pm.Censored("sT_c", sT_d, lower=2, upper=5)
    sT_t_d = pm.Truncated.dist(sT_d, lower=2, shape=1)
    st_t_c = pm.Censored("sT_t_c", sT_t_d, lower=None, upper=5) # this line gives truncation error
    idata_m2 = pm.sample_prior_predictive(samples=15_000)

Python implementation: CPython
Python version : 3.11.6
IPython version : 8.17.2

numpy: 1.25.2
pymc : 5.9.2
arviz: 0.16.1

Watermark: 2.4.3

Thanks @alaz I could reproduce the problem with a simpler scenario. It has nothing to do with Censored but with identical Truncated distributions in the same model. You should see the problem go away if you change one of the parameters to be slightly different, like nu=100 in one of the truncated and nu=101 in the other.

I opened an issue in the PyMC repo: Conflicts between multiple identical Truncated distributions · Issue #7024 · pymc-devs/pymc · GitHub

1 Like

Thank you very much!