Help with `logp` of a threshold model

sammosummo · October 16, 2019, 3:21pm

Imagine a data set with binary observations \boldsymbol{y} and a matrix of predictors \boldsymbol{X}. A common way to model such data is with logistic regression, which would look something like:

\boldsymbol{\pi}=\mathrm{logistic}\left(\boldsymbol{X}\boldsymbol{\beta}\right)\\\boldsymbol{y}\sim\mathrm{Bernoulli}\left(\boldsymbol{\pi}\right)

Basically a linear model is placed on the logit of \boldsymbol{\pi}, which is assumed to be the underlying probability of success on a Bernoulli trial. If \boldsymbol{y} were a disease, we would be placing the linear model on the log odds of developing the disease.

However in many fields of applied research, in particular modeling diseases, a liability or threshold model is used instead. Here, a linear model is placed on a latent variable which, when some threshold is exceeded, causes the disease. The latent variable is usually assumed to be normal with unit variance and the threshold is usually 0.

\boldsymbol{\psi}\sim\mathrm{Normal}\left(\boldsymbol{X}\boldsymbol{\beta},1\right)\\\boldsymbol{y}\begin{cases} 0 \textrm{ if }\boldsymbol{\psi}\le0 \\ 1 \textrm{ if }\boldsymbol{\psi}>0 \end{cases}

Given that the second model is actually quite simple, I feel like it should be doable in PyMC3. However, I’m not sure what the likelihood function should look like. Does it need a new distribution class?

junpenglao · October 18, 2019, 8:19am

A direct translation of the notation would be something like:

eps = 0.00001  # <== set it to 0 to have the exact notation, but
               #     you might ran into problem sampling
with pm.Model() as m:
    beta = ...
    psi = pm.Normal('psi', X@beta, 1.)
    observed = pm.Bernoulli('y', tt.switch(psi <= 0, eps, 1.-eps), observed=y)

However, a switch breaks the gradient and you might ran into problem sampling with NUTS, which you should replace it with a smoothstep function

Topic		Replies	Views
Meaning of model.logp, a beginner question Questions	4	952	July 12, 2019
How to model/deal with weighted binary outcomes Questions	11	2914	August 30, 2021
How to formulate a Lognormal likelihood using PyMC3? Questions	19	2326	March 10, 2021
Why does my model show logp of zero? Questions	3	574	June 12, 2020
How to model observed percentages (bounded from 0 to 1) Questions	8	2734	January 3, 2018

Help with `logp` of a threshold model

Related topics