EDIT 1: This model has been brought up here before. Seems like the author of that post was using a custom Op for the normal CDF, but it should be possible to do this in vanilla theano/pymc3 since tt.erf
exists.
EDIT 2: Based on the previous post, I have re-written the description and code to be more consistent with previous work. The error does not go away, but I get closer.
I’m trying to model ordinal data (e.g., Likert scale) with K possible ordinal values.
The ordered probit model approach from Kruschke (Chap. 23 in DBDA) assumes there is an underlying latent normal random variable with unknown mean \mu and scale \sigma, which is chopped up into K based on K-1 fixed thresholds, denoted by \theta_1, \theta_2, \dots \theta_{K-1}, . Without loss of generality, the lowest and highest thresholds are fixed: \theta_1\equiv1.5 and \theta_{K-1}\equiv K - 0.5.
Here is a minimal example:
import numpy as np
import pymc3 as pm
import theano.tensor as tt
with pm.Model():
K = 7
data = np.random.randint(K, size=100)
thresholds = np.linspace(1.5, K - .5, K - 1)
obsthresh = np.ma.asarray(thresholds)
obsthresh[1:-1] = np.ma.masked
theta = pm.Normal("theta", mu=thresholds, sigma=K, shape=K, observed=obsthresh)
theta = tt.concatenate([[-np.inf], theta, [np.inf]])
mu = pm.Normal(name="mu", mu=(1 + K) / 2, sd=K)
sig = pm.HalfCauchy(name="sig", beta=1)
_p = 0.5 * (1 + pm.math.erf((theta - mu) / (sig * tt.sqrt(2))))
p = _p[1:] - _p[0:-1]
pm.Deterministic(name="p", var=p)
# prior on data
pm.Categorical(name='y', p=p, observed=data)
trace = pm.sample(chains=2) # adding `step=pm.Slice()` here fixe "bad initial energy"
pm.traceplot(trace)
plt.show()
This works nicely with slice sampling but not with NUTS, and I don’t know why. Two possibilities come to mind:
- There is an
erf
in the model. - The lack of monotonicity constraint on the middle thresholds.
I can’t figure out if either or both of these are real problems. Any suggestions?