# Help with model specification

Hello,

I’m trying to implement a model for ordered response data, where ratings are collected using a survey with a Likert-type rating scale (1-5).

The model proposed by Muraki (1990) is a modification of the Graded Response Model (Samejima, 1969). The probability of a response option k \in {1, ..., K} is defined as:

\begin{split}p(X_{ij} = k) = \left\{ \begin{array}{l} 1 - P_{ij1}^* \,, \text{if } k = 1 \\ P_{ij,k-1}^* - P_{ijk}^* \,, \text{if } 1 < k < K \\ P_{ijk}^* \,, \text{if } k = K \\ \end{array} \right.\end{split}

where the cumulative probability of an observed response X being higher than k depends on the person ability \theta, the question difficulty \beta and the option threshold \delta:

P_{ijk}^* = p(X_{ij} > k \mid \theta_i, \alpha_j, \beta_j, \delta_k) = \text{logit}^{-1}(\alpha_j (\theta_i - \beta_j - \delta_k))

Code to generate artificial data:

import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Generate data
n_persons = 500
n_questions = 20
n_options = 5

# Latent variables
true_theta = np.random.normal(loc=0, scale=1, size=n_persons)
true_beta = np.random.normal(loc=0, scale=1, size=n_questions)
true_alpha = np.random.normal(loc=1, scale=0.1, size=n_questions)
true_delta = np.array([-1.5, -0.5, 0.5, 1.5])

# Cumulative probabilities
p_x_over_1 = np.zeros(shape=(n_persons, n_questions))
p_x_over_2 = np.zeros(shape=(n_persons, n_questions))
p_x_over_3 = np.zeros(shape=(n_persons, n_questions))
p_x_over_4 = np.zeros(shape=(n_persons, n_questions))

for j in range(n_questions):
eta1 = true_alpha[j] * (true_theta - true_beta[j] - true_delta)
p_x_over_1[:, j] = sigmoid(eta1)

eta2 = true_alpha[j] * (true_theta - true_beta[j] - true_delta)
p_x_over_2[:, j] = sigmoid(eta2)

eta3 = true_alpha[j] * (true_theta - true_beta[j] - true_delta)
p_x_over_3[:, j] = sigmoid(eta3)

eta4 = true_alpha[j] * (true_theta - true_beta[j] - true_delta)
p_x_over_4[:, j] = sigmoid(eta4)

# Likelihoods
p_x_1 = 1 - p_x_over_1
p_x_2 = p_x_over_1 - p_x_over_2
p_x_3 = p_x_over_2 - p_x_over_3
p_x_4 = p_x_over_3 - p_x_over_4
p_x_5 = p_x_over_4

# Likelihood 3-dimensional array
# n_options x n_persons x n_questions
p_x = np.array((p_x_1, p_x_2, p_x_3, p_x_4, p_x_5))

# Simulate response per person per question
x = np.zeros(shape=(n_persons, n_questions)).astype(int)
for i in range(n_persons):
for j in range(n_questions):
x[i, j] = np.random.choice(a=[0,1,2,3,4], size=1, p=p_x[:, i, j])

# Response matrix to vector
x_obs= x.flatten('C')
person_index = np.repeat(range(n_persons), n_questions)
question_index = np.tile(range(n_questions), n_persons)


I’ve had some success with the model specification below. It runs fine using MAP and MCMC, but when trying to use ADVI, I keep getting errors.

# Define model
with pm.Model() as grm_model:

# Person variable
theta = pm.Normal(
'theta', mu=0, sd=1,
shape=n_persons
)

# Question variables
beta = pm.Normal(
'beta', mu=0, sd=1,
shape=n_questions
)
alpha = pm.TruncatedNormal(
'alpha', mu=1, sd=0.1, lower=0,
shape=n_questions
)
delta = pm.Normal(
'delta', mu=[-1.5, -0.5, 0.5, 1.5], sd=0.1,
transform=pm.distributions.transforms.ordered,
testval=[-1.5, -0.5, 0.5, 1.5],
shape=4
)

# Likelihood = Item Response Function
eta = alpha[question_index] * (theta[person_index] - beta[question_index])

# Outcome
x = pm.OrderedLogistic(
'x', eta=eta, cutpoints=delta,
observed=x_obs,
testval=np.random.randint(4, size=n_persons * n_questions),
shape=(n_persons * n_questions)
)


Running this code produces FloatingPointError: NaN occurred in optimization.

with grm_model:


Running this code works fine:

with grm_model:
trace = pm.sample(
draws=500,
tune=500,
chains=4,

When removing the alpha parameter from the model, ADVI runs fine. I have tried experimenting with different distributions for alpha, such as HalfCauchy and Lognormal, but the error persists.
What could be the issue? Am I doing something wrong in defining the likelihood eta, or the way I’m using OrderedLogistic?