# How to marginalize latent discrete variables in CDMs (cognitive diagnosis models)?

Hi!I’m trying to marginalize DINA and the higher-order DINA model, but the results look terrible. I think I’m marginalizing it in the wrong way.
DINA is a classification model for educational assessment. See:
https://mc-stan.org/documentation/case-studies/dina_independent.html
The code for the simulated data:

import theano.tensor as tt
import pymc3 as pm
import numpy as np
import itertools
import arviz as az

N = 150   # student count
I = 14    # item count
K = 3     # skill count
# all class
all_class = np.array(list(itertools.product(*[[0,1] for k in range(K)])))
# random sample from all possible class
class_idx = np.random.randint(0,len(all_class),size=N)
student_attribute = all_class[class_idx]
# Q-matrix
Q = np.concatenate([all_class[1:] for i in range(2)],axis=0)
# ideal response
eta = (student_attribute.reshape(N,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
s = np.random.uniform(0,0.2,(1,I))
g = np.random.uniform(0,0.2,(1,I))
Y = np.random.binomial(1,eta*(1-s-g)+g)


# DINA model

## The original version

In the original version, I needed to convert the probability of a student belonging to a certain category into a discrete category.


# ideal response for all classes
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:

student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))
student_class = pm.Categorical("student_class",student_class_p,shape=N)

s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))

# 所有可能的属性掌握模式作答正确的概率，shape=C*I
all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
p_sample = pm.Deterministic("p_sample",all_c_p[student_class])

pm.Bernoulli("Y",p_sample,observed=Y)

with DINA:
tr = pm.sample(1000,tune=3000,chains=2,return_inferencedata=True)


The posterior of student_class

## Marginalized version

Then I try to marginalize student_class.

P(Y) = \sum_1^c \pi_c P(Y|c)

\pi_c is the probability that a student falls into a certain category (student_class_p in code), P(Y|c) is the probability of a student’s answer correct in category c (all_class_p in code).

all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
# The probability that a student falls into a certain category
student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))

s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))

# Probability of correct answer for all categories
all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)

p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_c_p))

pm.Bernoulli("Y",p_sample,observed=Y)


The posterior mean of student_class_p

After my attempt at marginalization, the categorization seemed to work poorly.The probability of students falling into different categories is very close

# Higher-order DINA

In addition, I also tried Higher-Order DINA. Whether students master skills is determined by higher levels of ability.

skill\_P_{nk} = logit^{-1}(\lambda_k\theta+\lambda0_k)

The difference lies in that whether students master specific skills is determined by higher-order ability. The probability of mastering skill K is as above, and lambda0 and lambdak are intercept and slope respectively.

In the original version, the probability of mastering a particular skill had to be converted into a dichotomous variable (skill_p → skill) before the probability of a student answering correctly was calculated. In the current version, the probability of a student mastering a specific skill is converted into the probability of a student belonging to a certain category, and then the probability of a student answering correctly is calculated from the probability of a student belonging to a certain category.

all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as HODINA:
# higher-order ability
theta_sample = pm.Normal("theta",0,1,shape=(N,1))
# intercept and slope
lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))

# The probability of mastering a particular skill
skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample+lambda_0_sample))

# Convert the probability that students belong to a
# certain skill into the probability that students belong to a certain category
student_class_p =pm.Deterministic("student_class_p",tt.exp(tt.log(skill_p_sample).dot(all_class.T)
+tt.log(1-skill_p_sample).dot(1-all_class.T)))

s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))

all_p_p = pm.Deterministic("all_pattern_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_p_p))

pm.Bernoulli("Y",p_sample,observed=Y)


However, the classification results of the model do not seem to be good. The probability of some students falling into different categories is close

The posterior mean of student_class_p:

However, when I added residuals to SKILL_P, the classification results improved

skill\_P_{nk} = logit^{-1}(\lambda_k\theta+\lambda0_k+\epsilon_{nk})
with pm.Model() as HODINA:
# higher-order ability
theta_sample = pm.Normal("theta",0,1,shape=(N,1))
# intercept and slope
lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))

# Skill_p residuals
sig = pm.HalfNormal("sig",2)
epsilon = pm.Normal("epsilon",0,sig,shape=(N,K))

# The probability of mastering a particular skill
skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample
+lambda_0_sample+epsilon))



The posterior mean of student_class_p:

And the residuals are very large

# END

In short, I have 2 questions:
q1: How to correctly marginalize the potential discrete parameters of the DINA model?
q2: Why can residual item improve classification performance, and is it necessary to add a residual item when marginalizing? Are there any references to this?

Any help would be appreciated!

I don’t see where you marginalized the categorical variable. You would have to enumerate over all the possibilities and collect the respective conditional probabilities (or use a helper that does this like pm.Mixture), but you seem to have simply dropped that variable?

I think I misunderstand marginalization.I will try pm.Mixture().

The code I rewrote is as follows

# ideal response for all student categories
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
# number of categories is 8
C = 8
# students categories, shape=(N,C)
student_class_p = pm.Dirichlet("student_class_p",tt.ones(C),shape=(N,C))

# repeat all_c_p I times and transform the shape of all_c_p as (N,I,C)
# the possibility in category c
w = tt.repeat(student_class_p,I).reshape((N,C,I)).transpose(0,2,1)

s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))

# correct possibility of all possible categories, shape=(C,I)
all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)

# repeat all_c_p N times and transform the shape of all_c_p as (N,I,C)
# correct possibility
n_i_c = tt.repeat(all_c_p,N).reshape((C,I,N)).transpose(2,1,0)
compond_dist = pm.Bernoulli.dist(p=n_i_c,shape=(N,I,C))

pm.Mixture("Y",w=w,comp_dists=compond_dist,observed=Y)
# Y.shape=(N,I)


But judging from the results, there seems to be something wrong. The correct classification rate is 78.7%, but it should be closer to 85.3%( directly sample from potential discrete variables ).

Thanks for your help and time ~