How to marginalize latent discrete variables in CDMs (cognitive diagnosis models)?

Hi!I’m trying to marginalize DINA and the higher-order DINA model, but the results look terrible. I think I’m marginalizing it in the wrong way.
DINA is a classification model for educational assessment. See:
https://mc-stan.org/documentation/case-studies/dina_independent.html
The code for the simulated data:

import theano.tensor as tt
import pymc3 as pm
import numpy as np
import itertools
import arviz as az


N = 150   # student count
I = 14    # item count
K = 3     # skill count
# all class
all_class = np.array(list(itertools.product(*[[0,1] for k in range(K)])))
# random sample from all possible class
class_idx = np.random.randint(0,len(all_class),size=N)
student_attribute = all_class[class_idx]
# Q-matrix
Q = np.concatenate([all_class[1:] for i in range(2)],axis=0)
# ideal response
eta = (student_attribute.reshape(N,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
s = np.random.uniform(0,0.2,(1,I))
g = np.random.uniform(0,0.2,(1,I))
Y = np.random.binomial(1,eta*(1-s-g)+g)

DINA model

The original version

In the original version, I needed to convert the probability of a student belonging to a certain category into a discrete category.


# ideal response for all classes
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:

    student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))
    student_class = pm.Categorical("student_class",student_class_p,shape=N)
   
   
    s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    
    # 所有可能的属性掌握模式作答正确的概率,shape=C*I
    all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
    p_sample = pm.Deterministic("p_sample",all_c_p[student_class])
    
    pm.Bernoulli("Y",p_sample,observed=Y)

with DINA:
    tr = pm.sample(1000,tune=3000,chains=2,return_inferencedata=True)

The posterior of student_class

Marginalized version

Then I try to marginalize student_class.

P(Y) = \sum_1^c \pi_c P(Y|c)

\pi_c is the probability that a student falls into a certain category (student_class_p in code), P(Y|c) is the probability of a student’s answer correct in category c (all_class_p in code).

all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
    # The probability that a student falls into a certain category
    student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))
 
   
    s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    
    # Probability of correct answer for all categories
    all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)

    p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_c_p))
    
    pm.Bernoulli("Y",p_sample,observed=Y)

The posterior mean of student_class_p


After my attempt at marginalization, the categorization seemed to work poorly.The probability of students falling into different categories is very close

Higher-order DINA

In addition, I also tried Higher-Order DINA. Whether students master skills is determined by higher levels of ability.

skill\_P_{nk} = logit^{-1}(\lambda_k\theta+\lambda0_k)

The difference lies in that whether students master specific skills is determined by higher-order ability. The probability of mastering skill K is as above, and lambda0 and lambdak are intercept and slope respectively.

In the original version, the probability of mastering a particular skill had to be converted into a dichotomous variable (skill_p → skill) before the probability of a student answering correctly was calculated. In the current version, the probability of a student mastering a specific skill is converted into the probability of a student belonging to a certain category, and then the probability of a student answering correctly is calculated from the probability of a student belonging to a certain category.

all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as HODINA:
    # higher-order ability
    theta_sample = pm.Normal("theta",0,1,shape=(N,1))
    # intercept and slope
    lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
    lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))
    
    # The probability of mastering a particular skill
    skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample+lambda_0_sample))
    
    # Convert the probability that students belong to a 
    # certain skill into the probability that students belong to a certain category
    student_class_p =pm.Deterministic("student_class_p",tt.exp(tt.log(skill_p_sample).dot(all_class.T)
                                        +tt.log(1-skill_p_sample).dot(1-all_class.T)))
    
    s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    
    
    all_p_p = pm.Deterministic("all_pattern_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
    p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_p_p))
    
    pm.Bernoulli("Y",p_sample,observed=Y)

However, the classification results of the model do not seem to be good. The probability of some students falling into different categories is close

The posterior mean of student_class_p:

However, when I added residuals to SKILL_P, the classification results improved

skill\_P_{nk} = logit^{-1}(\lambda_k\theta+\lambda0_k+\epsilon_{nk})
with pm.Model() as HODINA:
    # higher-order ability
    theta_sample = pm.Normal("theta",0,1,shape=(N,1))
    # intercept and slope
    lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
    lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))
    
    # Skill_p residuals
    sig = pm.HalfNormal("sig",2)
    epsilon = pm.Normal("epsilon",0,sig,shape=(N,K))
    
    # The probability of mastering a particular skill
    skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample
                                 +lambda_0_sample+epsilon))

The posterior mean of student_class_p:


And the residuals are very large

END

In short, I have 2 questions:
q1: How to correctly marginalize the potential discrete parameters of the DINA model?
q2: Why can residual item improve classification performance, and is it necessary to add a residual item when marginalizing? Are there any references to this?

Thanks for your time!
Any help would be appreciated!

I don’t see where you marginalized the categorical variable. You would have to enumerate over all the possibilities and collect the respective conditional probabilities (or use a helper that does this like pm.Mixture), but you seem to have simply dropped that variable?

I think I misunderstand marginalization.I will try pm.Mixture().

The code I rewrote is as follows

# ideal response for all student categories
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
    # number of categories is 8
    C = 8
    # students categories, shape=(N,C)
    student_class_p = pm.Dirichlet("student_class_p",tt.ones(C),shape=(N,C))
    
    # repeat all_c_p I times and transform the shape of all_c_p as (N,I,C) 
    # the possibility in category c
    w = tt.repeat(student_class_p,I).reshape((N,C,I)).transpose(0,2,1)
    
    s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    
    # correct possibility of all possible categories, shape=(C,I)
    all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
    
    # repeat all_c_p N times and transform the shape of all_c_p as (N,I,C)
    # correct possibility 
    n_i_c = tt.repeat(all_c_p,N).reshape((C,I,N)).transpose(2,1,0)
    compond_dist = pm.Bernoulli.dist(p=n_i_c,shape=(N,I,C))
    
    pm.Mixture("Y",w=w,comp_dists=compond_dist,observed=Y)
    # Y.shape=(N,I)

But judging from the results, there seems to be something wrong. :thinking:
The correct classification rate is 78.7%, but it should be closer to 85.3%( directly sample from potential discrete variables ).

Thanks for your help and time :grin:~