Correct shapes in multidimensional item response 2PL

slecon · February 12, 2024, 8:51pm

I’m trying to generalize a standard two-parameter logistic IRT model (with one-dimensional latent ability) to a multi-dimensional version. However, I’ve been stuck for days now with the slicing and shaping of the involved arrays, and would greatly appreciate some help.

Here’s the problem. The classic 2PL is

P(x_{ij}=1) = \frac{1}{1 +e^{D \alpha_j (\theta_i-\delta_j)}}

where D=-1.7, \alpha_j and \delta_j are the discrimination and difficulty parameters, respectively, for item j = 1, 2, \ldots, n_{items}, while \theta_{i} is the latent ability for examinee i = 1, 2, \ldots, n_{persons}. The PyMC code is to implement this is:

import numpy as np
import pymc as pm

n_persons = 100  # Number of persons
n_items = 25  # Number of items

# Random data for testing
responses = np.random.randint(0, 2, size=(n_persons, n_items))

with pm.Model() as IRT_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=n_items)
    diff = pm.Normal('diff', mu=0, sigma=1, shape=n_items)    

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=n_persons)

    # 2PL Model
    p = pm.math.invlogit(1.7 * discrim * (theta[:, None] - diff))

    # Likelihood
    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace = pm.sample(2000)

The generalization to a multidimensional item response (MIRT) model is straightforward in linear algebra

P(x_{ij}=1) = \frac{1}{1 +e^{D \left[ \alpha_{j1} (\theta_{i1}-\delta_{j1}) + \alpha_{j2} (\theta_{i2}-\delta_{j2}) \right] }} = \frac{1}{1 + e^{D \vec{\alpha}_{j}^{\mathsf{T}} \vec{\theta}_{i} + \vec{\gamma}_{j} }}

where \vec{\alpha}_{j} = \begin{bmatrix} \alpha_{j1} & \alpha_{j2} \end{bmatrix}^{\mathsf{T}}, \vec{\theta}_{i} = \begin{bmatrix} \theta_{i1} & \theta_{i2} \end{bmatrix}^{\mathsf{T}}, and \vec{\gamma}_{j} = - \left( \alpha_{j1} \delta_{j1} + \alpha_{j2} \delta_{j2} \right) (more details in de Ayala, 2022, chapter 10).

I’m struggling with the implementation of this in an elegant vectorized way. Because a crude version would obviously be:

with pm.Model() as crude_MIRT_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff = pm.Normal('diff', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)    

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    # 2PL Model    
    p = pm.math.invlogit(1.7 * discrim * (theta[:, None] - diff) + 1.7 * discrim2 * (theta2[:, None] - diff2))

    # Likelihood
    obs = pm.Bernoulli('obs', p=p, observed=responses)

But this non-vectorized version just makes it so much harder to later implement structural zeros among the discrimination parameters to exclude some of the thetas from certain items. So ideally one would like to work with 2D-arrays (matrices) for the parameters and abilities, i.e.

# ...
    # Priors for item parameters
    discrim = pm.Normal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))    

    # Priors for person abilities
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))

# ...

But for the life of me I can’t figure out how to slice and dice these arrays in the right shape so that the p=... equation works. Any help would be appreciated.

iavicenna · February 14, 2024, 3:36pm

The following works:

p = pm.math.invlogit(1.7 * discrim[:,0] * (theta[:,0, None] - diff[:,0]) +
                         1.7 * discrim[:,1] * (theta[:,1, None] - diff[:,1]))

which you can check by evaluating non vectorized and this expression for fixed initial values, seen as below:

import pymc as pm
import numpy as np

n_items = 10
n_persons = 2


responses = np.random.normal(0, 1, 10)

_discrim1 = np.random.lognormal(0, 1, size=n_items)
_discrim2 = np.random.lognormal(0, 1, size=n_items)
_discrim = np.array([_discrim1, _discrim2]).T

_diff1 = np.random.normal(0, 1, size=n_items)
_diff2 = np.random.normal(0, 1, size=n_items)
_diff = np.array([_diff1, _diff2]).T

_theta1 = np.random.normal(0, 1, size=n_persons)
_theta2 = np.random.normal(0, 1, size=n_persons)
_theta = np.array([_theta1, _theta2]).T



with pm.Model() as crude_MIRT_2PL:
    # Parameters priors
    discrim1 = pm.LogNormal('discrim1', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff1 = pm.Normal('diff1', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)

    #Abilities prior
    theta1 = pm.Normal('theta1', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    p1 = pm.math.invlogit(1.7 * discrim1 * (theta1[:, None] - diff1) + 1.7 * \
                          discrim2 * (theta2[:, None] - diff2))

    discrim = pm.Normal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))

    # Priors for person abilities
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))



    # 2PL Model
    p2 = pm.math.invlogit(1.7 * discrim[:,0] * (theta[:,0, None] - diff[:,0]) +
                         1.7 * discrim[:,1] * (theta[:,1, None] - diff[:,1]))


    _p1 = p1.eval({discrim1:_discrim1, discrim2:_discrim2,
                   diff1:_diff1, diff2:_diff2,
                   theta1:_theta1, theta2:_theta2})

    _p2 = p2.eval({discrim:_discrim, diff:_diff, theta:_theta})

    print(_p1-_p2)

I think you can vectorize this further, but need to think about it a bit more.

iavicenna · February 14, 2024, 3:46pm

Okay the ultimate vectorization can be achieved via

 pm.math.invlogit((1.7 * discrim* (theta[:,None,:] - diff)).sum(axis=-1))

disclaimer: I did not check if this is the same as your formula, I only checked it against the non-vectorized version you presented

slecon · February 15, 2024, 8:20pm

Thank you for your response. I modified your code a little and ran pm.sample() on the whole model. I changed the simulated data, since your responses = np.random.normal(0, 1, 10) creates continuous values while IRT models expect either 0 or 1. A non-simulated data set we could use is

import pandas as pd

responses = pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta")
n_persons, n_items = responses.shape  # should be 800, 9

Anyhow, with both this data or the simulated data (my code above) your model runs but seems to produce different results than my “crude version” above. In particular, both theta variables seem to be very tightly distributed around 0 in your model, while they aren’t in mine.

iavicenna · February 16, 2024, 2:20pm

Have you tried evaluating the priors and transformed priors like I did above with fixed input values, can you spot if any of them are different? Are you using your transformed priors in the likelihood correctly? You can evaluate model likelihoods with

model.compile_logp()({"var_name": var_value,...})

If everything is the same, one would expect the end results to be the same.

slecon · February 16, 2024, 9:34pm

I tried to generalize my 2PL code from the original posting by extending shape values of the priors. The idea for an actual multidimensional IRT is always to exclude some of the abilities from some of the items, so as to achieve identification.

import numpy as np
import pandas as pd
import pymc as pm

# The dataset includes 800 student responses to 9 test questions 
# intended to measure mathematical ability.
responses = pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta")
n_persons, n_items = responses.shape

# Create matrix of 1's and 0's for item loadings
item_loads = np.vstack((np.tile(np.array([1,1,0]),(5,1)),
                        np.tile(np.array([1,0,1]),(4,1))))
# Arbitrarily chose the first five items to load only on the first two abilities

with pm.Model() as mirt_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=(n_items, 3))
    diff = pm.Normal('diff', mu=0, sigma=2, shape=n_items)

    # Create 'discrim' parameters with structural zeros
    discrim_str = pm.Deterministic('discrim_str', discrim * item_loads)

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 3))

    # 2PL MIRT Model    
    p = pm.math.invlogit(pm.math.dot(theta, discrim_str.T) + diff)

    # Likelihood
    observed = pm.Bernoulli('observed', p=p, observed=responses)

    trace = pm.sample(2000, tune=1000, chains=2, cores=2)

The item_loads matrix establishes the structure of the latent abilities, such that discrim_str = pm.Deterministic('discrim_str', discrim * item_loads) then creates

\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & 0 \\ \vdots & \vdots & \vdots \\ a_{5,1} & a_{5,2} & 0 \\ a_{6,1} & 0 & a_{6,3} \\ \vdots & \vdots & \vdots \\ a_{9,1} & 0 & a_{9,3} \end{bmatrix} = \begin{bmatrix} \vec{a_1} \\ \vdots \\ \vec{a_5} \\ \vec{a_6} \\ \vdots \\ \vec{a_9} \end{bmatrix}

which means each row of \mathbf{A} corresponds to the \vec{a_j} in

P(x_{ij}=1) = \frac{1}{1 + e^{D \vec{a_j}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_j} }}

from above. For instance

P(x_{i5}=1) = \frac{1}{1 + e^{D \vec{a_5}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_5} }} = \frac{1}{1 + e^{D \left[ a_{5,1} \theta_{i,1} + a_{5,2} \theta_{i,2} \right] + \vec{\gamma_5} }}

and

P(x_{i6}=1) = \frac{1}{1 + e^{D \vec{a_6}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_6} }} = \frac{1}{1 + e^{D \left[ a_{6,1} \theta_{i,1} + a_{6,3} \theta_{i,3} \right] + \vec{\gamma_6} }}

The code above runs, but I have doubts pm.math.dot(theta, discrim_str.T) properly indexes discrimination parameters and latent abilities, as compared to discrim * theta[:, None] in the one-dimensional 2PL. Some more sophisticated version of [:, None] is needed.

iavicenna · February 19, 2024, 1:13pm

What is the non-vectorized code that you are comparing the code above to? It is quite hard to get into the formulas above but if you provide a non-vectorized code which you believe works correctly, I can try to help you diagnose where the difference stems from.

In the end if p is identical in both cases for whatever initial parameters you supply and if the usage of p is correct inside the likelihood, I would strongly expect the results to be similar. So the first thing to check is if p evaluates to the same value in both cases. If it does not, you can step by step go into the priors which it depends on and evaluate them to compare.

If both p’s are equal and the results are different then check the likelihood. If the likelihoods give the same values for different initial points but the results are very different (i.e HDIs of one model not covering the estimates of the other etc) then there is something else we are not considering.

slecon · February 20, 2024, 9:21pm

I think a benchmark for whether the MIRT code does what it should would be to split responses into two “artificial” tests and use item_loads such that effectively a uni-dimensional latent ability \theta is being estimated.

# Create matrix of 1's and 0's for item loadings
item_loads = np.vstack((np.tile(np.array([1,0]),(5,1)),
                        np.tile(np.array([0,1]),(4,1))))

This code, in conjunction with discrim_str = pm.Deterministic('discrim_str', discrim * item_loads), and changing the shapes of discrim and theta to (n_items, 2), should produce a matrix

\mathbf{A} = \begin{bmatrix} a_{1,1} & 0 \\ a_{2,1} & 0 \\ \vdots & \vdots \\ a_{5,1} & 0 \\ 0 & a_{6,2} \\ \vdots & \vdots \\ 0 & a_{9,2} \end{bmatrix}

and thereby effectively only include \theta_1 for the first five questions, and \theta_2 only for the last four.

In theory, this should give the same theta posteriors as just estimating the uni-dimensional 2PL from the opening post using only first_test, and then only second_test, defined by

first_test = responses.iloc[:, :5]
second_test = responses.iloc[:, 5:]

iavicenna · February 21, 2024, 4:05am

I have run both models (vectorized and non-vectorized) on the data you have suggested and the posterior samples are almost identical (left unvectorized, right vectorized)

diff:

theta:

discrim:

The models are here (although I must say there were divergences even with target_accept=0.98, so building on top of the vectorized version you may need to improve your model maybe):

import pymc as pm
import pandas as pd
import arviz as az

with pm.Model() as nonvectorized_MIRT_2PL:

    discrim1 = pm.LogNormal('discrim1', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff1 = pm.Normal('diff1', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)

    #Abilities prior
    theta1 = pm.Normal('theta1', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    #stacked for posterior plots
    pm.Deterministic("discrim", ptt.stack([discrim1, discrim2]).T)
    pm.Deterministic("diff", ptt.stack([diff1, diff2]).T)
    pm.Deterministic("theta", ptt.stack([theta1, theta2]).T)

    p = pm.math.invlogit(1.7 * discrim1 * (theta1[:, None] - diff1) + 1.7 * \
                          discrim2 * (theta2[:, None] - diff2))


    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace1 = pm.sample(3000, tune=3000, chains=6, target_accept=0.98)


with pm.Model() as vectorized_MIRT_2PL:

    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))

    p = pm.math.invlogit((1.7 * discrim* (theta[:,None,:] - diff)).sum(axis=-1))

    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace2 = pm.sample(3000, tune=3000, chains=6, target_accept=0.98)

az.plot_posterior(trace1, var_names=["discrim"])
az.plot_posterior(trace2, var_names=["discrim"])

az.plot_posterior(trace1, var_names=["theta"])
az.plot_posterior(trace2, var_names=["theta"])

az.plot_posterior(trace1, var_names=["diff"])
az.plot_posterior(trace2, var_names=["diff"])

ricardoV94 · February 21, 2024, 8:20am

The vectorization was only a suggestion for speedup, it was expected to be equivalent. If you are getting divergences even with such a high target_accept that’s where we should have focused and that means you have a model that poorly identified and/or needs to be reparametrized.

When you increase target_accept the sampler needs more evaluations which will certainly make it slower.

ricardoV94 · February 22, 2024, 8:16am

Sorry for the message above (deleted), it was meant for another thread

slecon · February 22, 2024, 10:24pm

@iavicenna, I think you may be right. I ran both the uni-dimensional 2PL on just the first 5 items of pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta") and then the two-dimensional 2PL with structural zeros on the second ability for the first five questions, and the results are indeed nearly identical. I was plotting not with Arviz but with my own homebrew histogram with matplotlib, that must’ve been the mistake. Thank you so much for your help though!

And because I’m not too familiar with arviz as of yet, just one final question. In az.plot_posterior(trace2, var_names=["theta"]), how can I selectively plot only the first dimension of theta, or or theta only for the first test-taker?

iavicenna · February 29, 2024, 1:52pm

You can do it as such:

import numpy as np
import pymc as pm
import arviz as az

means = [10, 20]
sds = [2, 5]
N = 100
obs0 = np.random.normal(means[0], sds[0], N)
obs1 = np.random.normal(means[1], sds[1], N)

obs = np.array([obs0, obs1])


coords = {"group":range(2),
          "obs":range(N)}

with pm.Model(coords=coords) as model:

  sds = pm.HalfNormal("sds", 5, dims = "group")
  means = pm.HalfNormal("means", 10, dims = "group")

  pm.Normal("likelihood", means, sds, observed=obs.T)

  trace = pm.sample(1000, tune=1000)


az.plot_posterior(trace.posterior[["sds"]].sel({"group": [0]}))

Topic		Replies	Views
Item response theory model with odd shape mismatch v5 modeling	8	691	January 6, 2023
Is this an appropriate way to model IRT in PyMC3 Questions theano , doc	2	705	May 3, 2021
3AFC task modelled as mulitvariate normal 2D - logp shape mismatch v5 shape_issue , modeling	0	11	August 20, 2024
IRT w/ PyMC3, unusual number of parameters modeled? Questions	4	876	November 19, 2020
Multivariate-multinomial logistic regression Questions	10	6605	June 19, 2020

Correct shapes in multidimensional item response 2PL

Related topics