Censored linear regression with non-constant censoring threshold

Jack_Medley · May 24, 2020, 2:43pm

Hi all,

I’ve been reading a little about using EM to fit regression models with some censored data. I wondered about using pymc3 to fit the same data to compare the results between the two methods but am having some difficulties.

The data and some visualisations of the data can be found here:

https://jackmedley.gitlab.io/blog/post/em_algorithm_2/

In summary; Some linear data is generated but then censored above some threshold line. The threshold line has non-zero gradient so the censored values for the data point is different.

I have tried implementing this in pymc3 as follows:

import pymc3 as pm

with pm.Model() as model:
    b0 = pm.Normal('b0', 0.0, 100.0)
    b1 = pm.Normal('b1', 0.0, 100.0)
    sigma = pm.HalfCauchy('sigma', 100.0)
    zz = [
        pm.TruncatedNormal(f'Z_{itr}', b0 + b1 * x, sigma, lower=y)
        for itr, (x, y) in enumerate(zip(xx[censored], yy[censored]))
    ]
    pm.Normal('y_observed', b0 + b1 * xx[~censored], sigma, observed=yy[~censored])

    trace = pm.sample(1000, tune=1000, cores=16)

As you can I’ve drawn each z (which in the notation of the link above is the latent variable we don’t get to observe) from a truncated normal which ‘starts’ at the censored value, and I’ve drawn the observed y values (not censored) from a Normal distribution.

When I sample this it takes quite a lot longer than an ordinary linear regression, presumably because I’ve added so many extra z variables, but the result is the same as I get when I run a regular OLS:

with pm.Model() as model:
    b0, b1 = pm.Normal('b0', 0.0, 100.0), pm.Normal('b1', 0.0, 100.0)
    sigma = pm.HalfCauchy('sigma', 100.0)
    pm.Normal('Y', b0 + b1 * xx[~censored], sigma, observed=yy[~censored])
    trace = pm.sample(1000, tune=1000, cores=16)

Any tips for where I’m going wrong would be much appreciated. And if anyone knows a way of avoiding my zz hack that would be great too!

Thanks!
Jack

DanWeitzenfeld · May 25, 2020, 2:08am

Hi Jack,
Check out this gist:

gist.github.com

https://gist.github.com/DanielWeitzenfeld/e4ab8c63d106d7b84bcc5841de9cceb8

Censored Regression.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]

This file has been truncated. show original

And this page from the Stan docs:

Let me know if you have any questions
-Dan

Jack_Medley · May 26, 2020, 8:36am

Hi Dan,

Thanks for your help and the reference - that clears it up.

Jack

Topic		Replies	Views
Modeling censored x-y data set with x-dependent boundary Questions	4	1127	July 23, 2018
Advice on Modeling Censored Data from a Ratio of Variables version agnostic	1	21	March 13, 2025
Regression with censored response variable Questions	0	577	October 24, 2019
Help with Censored Regression Questions	23	3611	January 24, 2021
Truncated observed data? Questions	6	2307	November 6, 2020

Censored linear regression with non-constant censoring threshold

Related topics