Correct shapes in multidimensional item response 2PL

I’m trying to generalize a standard two-parameter logistic IRT model (with one-dimensional latent ability) to a multi-dimensional version. However, I’ve been stuck for days now with the slicing and shaping of the involved arrays, and would greatly appreciate some help.

Here’s the problem. The classic 2PL is

P(x_{ij}=1) = \frac{1}{1 +e^{D \alpha_j (\theta_i-\delta_j)}}

where D=-1.7, \alpha_j and \delta_j are the discrimination and difficulty parameters, respectively, for item j = 1, 2, \ldots, n_{items}, while \theta_{i} is the latent ability for examinee i = 1, 2, \ldots, n_{persons}. The PyMC code is to implement this is:

import numpy as np
import pymc as pm

n_persons = 100  # Number of persons
n_items = 25  # Number of items

# Random data for testing
responses = np.random.randint(0, 2, size=(n_persons, n_items))

with pm.Model() as IRT_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=n_items)
    diff = pm.Normal('diff', mu=0, sigma=1, shape=n_items)    

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=n_persons)

    # 2PL Model
    p = pm.math.invlogit(1.7 * discrim * (theta[:, None] - diff))

    # Likelihood
    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace = pm.sample(2000)

The generalization to a multidimensional item response (MIRT) model is straightforward in linear algebra

P(x_{ij}=1) = \frac{1}{1 +e^{D \left[ \alpha_{j1} (\theta_{i1}-\delta_{j1}) + \alpha_{j2} (\theta_{i2}-\delta_{j2}) \right] }} = \frac{1}{1 + e^{D \vec{\alpha}_{j}^{\mathsf{T}} \vec{\theta}_{i} + \vec{\gamma}_{j} }}

where \vec{\alpha}_{j} = \begin{bmatrix} \alpha_{j1} & \alpha_{j2} \end{bmatrix}^{\mathsf{T}}, \vec{\theta}_{i} = \begin{bmatrix} \theta_{i1} & \theta_{i2} \end{bmatrix}^{\mathsf{T}}, and \vec{\gamma}_{j} = - \left( \alpha_{j1} \delta_{j1} + \alpha_{j2} \delta_{j2} \right) (more details in de Ayala, 2022, chapter 10).

I’m struggling with the implementation of this in an elegant vectorized way. Because a crude version would obviously be:

with pm.Model() as crude_MIRT_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff = pm.Normal('diff', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)    

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    # 2PL Model    
    p = pm.math.invlogit(1.7 * discrim * (theta[:, None] - diff) + 1.7 * discrim2 * (theta2[:, None] - diff2))

    # Likelihood
    obs = pm.Bernoulli('obs', p=p, observed=responses)

But this non-vectorized version just makes it so much harder to later implement structural zeros among the discrimination parameters to exclude some of the thetas from certain items. So ideally one would like to work with 2D-arrays (matrices) for the parameters and abilities, i.e.

# ...
    # Priors for item parameters
    discrim = pm.Normal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))    

    # Priors for person abilities
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))

# ...

But for the life of me I can’t figure out how to slice and dice these arrays in the right shape so that the p=... equation works. Any help would be appreciated.

The following works:

p = pm.math.invlogit(1.7 * discrim[:,0] * (theta[:,0, None] - diff[:,0]) +
                         1.7 * discrim[:,1] * (theta[:,1, None] - diff[:,1]))

which you can check by evaluating non vectorized and this expression for fixed initial values, seen as below:

import pymc as pm
import numpy as np

n_items = 10
n_persons = 2


responses = np.random.normal(0, 1, 10)

_discrim1 = np.random.lognormal(0, 1, size=n_items)
_discrim2 = np.random.lognormal(0, 1, size=n_items)
_discrim = np.array([_discrim1, _discrim2]).T

_diff1 = np.random.normal(0, 1, size=n_items)
_diff2 = np.random.normal(0, 1, size=n_items)
_diff = np.array([_diff1, _diff2]).T

_theta1 = np.random.normal(0, 1, size=n_persons)
_theta2 = np.random.normal(0, 1, size=n_persons)
_theta = np.array([_theta1, _theta2]).T



with pm.Model() as crude_MIRT_2PL:
    # Parameters priors
    discrim1 = pm.LogNormal('discrim1', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff1 = pm.Normal('diff1', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)

    #Abilities prior
    theta1 = pm.Normal('theta1', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    p1 = pm.math.invlogit(1.7 * discrim1 * (theta1[:, None] - diff1) + 1.7 * \
                          discrim2 * (theta2[:, None] - diff2))

    discrim = pm.Normal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))

    # Priors for person abilities
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))



    # 2PL Model
    p2 = pm.math.invlogit(1.7 * discrim[:,0] * (theta[:,0, None] - diff[:,0]) +
                         1.7 * discrim[:,1] * (theta[:,1, None] - diff[:,1]))


    _p1 = p1.eval({discrim1:_discrim1, discrim2:_discrim2,
                   diff1:_diff1, diff2:_diff2,
                   theta1:_theta1, theta2:_theta2})

    _p2 = p2.eval({discrim:_discrim, diff:_diff, theta:_theta})

    print(_p1-_p2)

I think you can vectorize this further, but need to think about it a bit more.

Okay the ultimate vectorization can be achieved via

 pm.math.invlogit((1.7 * discrim* (theta[:,None,:] - diff)).sum(axis=-1))

disclaimer: I did not check if this is the same as your formula, I only checked it against the non-vectorized version you presented

Thank you for your response. I modified your code a little and ran pm.sample() on the whole model. I changed the simulated data, since your responses = np.random.normal(0, 1, 10) creates continuous values while IRT models expect either 0 or 1. A non-simulated data set we could use is

import pandas as pd

responses = pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta")
n_persons, n_items = responses.shape  # should be 800, 9

Anyhow, with both this data or the simulated data (my code above) your model runs but seems to produce different results than my “crude version” above. In particular, both theta variables seem to be very tightly distributed around 0 in your model, while they aren’t in mine.

Have you tried evaluating the priors and transformed priors like I did above with fixed input values, can you spot if any of them are different? Are you using your transformed priors in the likelihood correctly? You can evaluate model likelihoods with

model.compile_logp()({"var_name": var_value,...})

If everything is the same, one would expect the end results to be the same.

I tried to generalize my 2PL code from the original posting by extending shape values of the priors. The idea for an actual multidimensional IRT is always to exclude some of the abilities from some of the items, so as to achieve identification.

import numpy as np
import pandas as pd
import pymc as pm

# The dataset includes 800 student responses to 9 test questions 
# intended to measure mathematical ability.
responses = pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta")
n_persons, n_items = responses.shape

# Create matrix of 1's and 0's for item loadings
item_loads = np.vstack((np.tile(np.array([1,1,0]),(5,1)),
                        np.tile(np.array([1,0,1]),(4,1))))
# Arbitrarily chose the first five items to load only on the first two abilities

with pm.Model() as mirt_2PL:
    # Parameters priors
    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=(n_items, 3))
    diff = pm.Normal('diff', mu=0, sigma=2, shape=n_items)

    # Create 'discrim' parameters with structural zeros
    discrim_str = pm.Deterministic('discrim_str', discrim * item_loads)

    # Abilities prior
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 3))

    # 2PL MIRT Model    
    p = pm.math.invlogit(pm.math.dot(theta, discrim_str.T) + diff)

    # Likelihood
    observed = pm.Bernoulli('observed', p=p, observed=responses)

    trace = pm.sample(2000, tune=1000, chains=2, cores=2)

The item_loads matrix establishes the structure of the latent abilities, such that discrim_str = pm.Deterministic('discrim_str', discrim * item_loads) then creates

\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & 0 \\ \vdots & \vdots & \vdots \\ a_{5,1} & a_{5,2} & 0 \\ a_{6,1} & 0 & a_{6,3} \\ \vdots & \vdots & \vdots \\ a_{9,1} & 0 & a_{9,3} \end{bmatrix} = \begin{bmatrix} \vec{a_1} \\ \vdots \\ \vec{a_5} \\ \vec{a_6} \\ \vdots \\ \vec{a_9} \end{bmatrix}

which means each row of \mathbf{A} corresponds to the \vec{a_j} in

P(x_{ij}=1) = \frac{1}{1 + e^{D \vec{a_j}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_j} }}

from above. For instance

P(x_{i5}=1) = \frac{1}{1 + e^{D \vec{a_5}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_5} }} = \frac{1}{1 + e^{D \left[ a_{5,1} \theta_{i,1} + a_{5,2} \theta_{i,2} \right] + \vec{\gamma_5} }}

and

P(x_{i6}=1) = \frac{1}{1 + e^{D \vec{a_6}^\mathsf{T} \vec{\theta}_{i} + \vec{\gamma_6} }} = \frac{1}{1 + e^{D \left[ a_{6,1} \theta_{i,1} + a_{6,3} \theta_{i,3} \right] + \vec{\gamma_6} }}

The code above runs, but I have doubts pm.math.dot(theta, discrim_str.T) properly indexes discrimination parameters and latent abilities, as compared to discrim * theta[:, None] in the one-dimensional 2PL. Some more sophisticated version of [:, None] is needed.

What is the non-vectorized code that you are comparing the code above to? It is quite hard to get into the formulas above but if you provide a non-vectorized code which you believe works correctly, I can try to help you diagnose where the difference stems from.

In the end if p is identical in both cases for whatever initial parameters you supply and if the usage of p is correct inside the likelihood, I would strongly expect the results to be similar. So the first thing to check is if p evaluates to the same value in both cases. If it does not, you can step by step go into the priors which it depends on and evaluate them to compare.

If both p’s are equal and the results are different then check the likelihood. If the likelihoods give the same values for different initial points but the results are very different (i.e HDIs of one model not covering the estimates of the other etc) then there is something else we are not considering.

I think a benchmark for whether the MIRT code does what it should would be to split responses into two “artificial” tests and use item_loads such that effectively a uni-dimensional latent ability \theta is being estimated.

# Create matrix of 1's and 0's for item loadings
item_loads = np.vstack((np.tile(np.array([1,0]),(5,1)),
                        np.tile(np.array([0,1]),(4,1))))

This code, in conjunction with discrim_str = pm.Deterministic('discrim_str', discrim * item_loads), and changing the shapes of discrim and theta to (n_items, 2), should produce a matrix

\mathbf{A} = \begin{bmatrix} a_{1,1} & 0 \\ a_{2,1} & 0 \\ \vdots & \vdots \\ a_{5,1} & 0 \\ 0 & a_{6,2} \\ \vdots & \vdots \\ 0 & a_{9,2} \end{bmatrix}

and thereby effectively only include \theta_1 for the first five questions, and \theta_2 only for the last four.

In theory, this should give the same theta posteriors as just estimating the uni-dimensional 2PL from the opening post using only first_test, and then only second_test, defined by

first_test = responses.iloc[:, :5]
second_test = responses.iloc[:, 5:]

I have run both models (vectorized and non-vectorized) on the data you have suggested and the posterior samples are almost identical (left unvectorized, right vectorized)

diff:

theta:

discrim:

The models are here (although I must say there were divergences even with target_accept=0.98, so building on top of the vectorized version you may need to improve your model maybe):

import pymc as pm
import pandas as pd
import arviz as az

with pm.Model() as nonvectorized_MIRT_2PL:

    discrim1 = pm.LogNormal('discrim1', mu=0, sigma=1, shape=n_items)
    discrim2 = pm.LogNormal('discrim2', mu=0, sigma=1, shape=n_items)
    diff1 = pm.Normal('diff1', mu=0, sigma=1, shape=n_items)
    diff2 = pm.Normal('diff2', mu=0, sigma=1, shape=n_items)

    #Abilities prior
    theta1 = pm.Normal('theta1', mu=0, sigma=1, shape=n_persons)
    theta2 = pm.Normal('theta2', mu=0, sigma=1, shape=n_persons)

    #stacked for posterior plots
    pm.Deterministic("discrim", ptt.stack([discrim1, discrim2]).T)
    pm.Deterministic("diff", ptt.stack([diff1, diff2]).T)
    pm.Deterministic("theta", ptt.stack([theta1, theta2]).T)

    p = pm.math.invlogit(1.7 * discrim1 * (theta1[:, None] - diff1) + 1.7 * \
                          discrim2 * (theta2[:, None] - diff2))


    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace1 = pm.sample(3000, tune=3000, chains=6, target_accept=0.98)


with pm.Model() as vectorized_MIRT_2PL:

    discrim = pm.LogNormal('discrim', mu=0, sigma=1, shape=(n_items, 2))
    diff = pm.Normal('diff', mu=0, sigma=1, shape=(n_items, 2))
    theta = pm.Normal('theta', mu=0, sigma=1, shape=(n_persons, 2))

    p = pm.math.invlogit((1.7 * discrim* (theta[:,None,:] - diff)).sum(axis=-1))

    obs = pm.Bernoulli('obs', p=p, observed=responses)

    trace2 = pm.sample(3000, tune=3000, chains=6, target_accept=0.98)

az.plot_posterior(trace1, var_names=["discrim"])
az.plot_posterior(trace2, var_names=["discrim"])

az.plot_posterior(trace1, var_names=["theta"])
az.plot_posterior(trace2, var_names=["theta"])

az.plot_posterior(trace1, var_names=["diff"])
az.plot_posterior(trace2, var_names=["diff"])

The vectorization was only a suggestion for speedup, it was expected to be equivalent. If you are getting divergences even with such a high target_accept that’s where we should have focused and that means you have a model that poorly identified and/or needs to be reparametrized.

When you increase target_accept the sampler needs more evaluations which will certainly make it slower.

Sorry for the message above (deleted), it was meant for another thread

@iavicenna, I think you may be right. I ran both the uni-dimensional 2PL on just the first 5 items of pd.read_stata("https://www.stata-press.com/data/r18/masc1.dta") and then the two-dimensional 2PL with structural zeros on the second ability for the first five questions, and the results are indeed nearly identical. I was plotting not with Arviz but with my own homebrew histogram with matplotlib, that must’ve been the mistake. Thank you so much for your help though!

And because I’m not too familiar with arviz as of yet, just one final question. In az.plot_posterior(trace2, var_names=["theta"]), how can I selectively plot only the first dimension of theta, or or theta only for the first test-taker?

You can do it as such:

import numpy as np
import pymc as pm
import arviz as az

means = [10, 20]
sds = [2, 5]
N = 100
obs0 = np.random.normal(means[0], sds[0], N)
obs1 = np.random.normal(means[1], sds[1], N)

obs = np.array([obs0, obs1])


coords = {"group":range(2),
          "obs":range(N)}

with pm.Model(coords=coords) as model:

  sds = pm.HalfNormal("sds", 5, dims = "group")
  means = pm.HalfNormal("means", 10, dims = "group")

  pm.Normal("likelihood", means, sds, observed=obs.T)

  trace = pm.sample(1000, tune=1000)


az.plot_posterior(trace.posterior[["sds"]].sel({"group": [0]}))