Ordered probit model for categorical data

sammosummo · May 14, 2020, 3:50pm

EDIT 1: This model has been brought up here before. Seems like the author of that post was using a custom Op for the normal CDF, but it should be possible to do this in vanilla theano/pymc3 since tt.erf exists.

EDIT 2: Based on the previous post, I have re-written the description and code to be more consistent with previous work. The error does not go away, but I get closer.

I’m trying to model ordinal data (e.g., Likert scale) with K possible ordinal values.

The ordered probit model approach from Kruschke (Chap. 23 in DBDA) assumes there is an underlying latent normal random variable with unknown mean \mu and scale \sigma, which is chopped up into K based on K-1 fixed thresholds, denoted by \theta_1, \theta_2, \dots \theta_{K-1}, . Without loss of generality, the lowest and highest thresholds are fixed: \theta_1\equiv1.5 and \theta_{K-1}\equiv K - 0.5.

Here is a minimal example:

import numpy as np
import pymc3 as pm
import theano.tensor as tt


with pm.Model():

    K = 7 
    data = np.random.randint(K, size=100)

    thresholds = np.linspace(1.5, K - .5, K - 1)
    obsthresh = np.ma.asarray(thresholds)
    obsthresh[1:-1] = np.ma.masked

    theta = pm.Normal("theta", mu=thresholds, sigma=K, shape=K, observed=obsthresh)
    theta = tt.concatenate([[-np.inf], theta, [np.inf]])

    mu = pm.Normal(name="mu", mu=(1 + K) / 2, sd=K)
    sig = pm.HalfCauchy(name="sig", beta=1)

    _p = 0.5 * (1 + pm.math.erf((theta - mu) / (sig * tt.sqrt(2))))
    p = _p[1:] - _p[0:-1]
    pm.Deterministic(name="p", var=p)

    # prior on data
    pm.Categorical(name='y', p=p, observed=data)

    trace = pm.sample(chains=2)  # adding `step=pm.Slice()` here fixe "bad initial energy"
    pm.traceplot(trace)
    plt.show()

This works nicely with slice sampling but not with NUTS, and I don’t know why. Two possibilities come to mind:

There is an erf in the model.
The lack of monotonicity constraint on the middle thresholds.

I can’t figure out if either or both of these are real problems. Any suggestions?

Topic		Replies	Views
Ordered probit model for ordinal data Questions	11	4495	May 14, 2020
Kuschke-style fixed threshold cumulative ordinal probit model Questions	4	683	September 13, 2022
Compile Error when model contains a TruncatedNormal Questions theano	4	1185	March 23, 2019
Creating a Hierarchical Ordinal Logistic Model Questions	0	527	June 28, 2019
Strange error with Categorical distribution Questions	3	472	August 15, 2018

Ordered probit model for categorical data

Related topics