Hi!I’m trying to marginalize DINA and the higher-order DINA model, but the results look terrible. I think I’m marginalizing it in the wrong way.
DINA is a classification model for educational assessment. See:
https://mc-stan.org/documentation/case-studies/dina_independent.html
The code for the simulated data:
import theano.tensor as tt
import pymc3 as pm
import numpy as np
import itertools
import arviz as az
N = 150 # student count
I = 14 # item count
K = 3 # skill count
# all class
all_class = np.array(list(itertools.product(*[[0,1] for k in range(K)])))
# random sample from all possible class
class_idx = np.random.randint(0,len(all_class),size=N)
student_attribute = all_class[class_idx]
# Q-matrix
Q = np.concatenate([all_class[1:] for i in range(2)],axis=0)
# ideal response
eta = (student_attribute.reshape(N,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
s = np.random.uniform(0,0.2,(1,I))
g = np.random.uniform(0,0.2,(1,I))
Y = np.random.binomial(1,eta*(1-s-g)+g)
DINA model
The original version
In the original version, I needed to convert the probability of a student belonging to a certain category into a discrete category.
# ideal response for all classes
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))
student_class = pm.Categorical("student_class",student_class_p,shape=N)
s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))
# 所有可能的属性掌握模式作答正确的概率,shape=C*I
all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
p_sample = pm.Deterministic("p_sample",all_c_p[student_class])
pm.Bernoulli("Y",p_sample,observed=Y)
with DINA:
tr = pm.sample(1000,tune=3000,chains=2,return_inferencedata=True)
The posterior of student_class
Marginalized version
Then I try to marginalize student_class.
\pi_c is the probability that a student falls into a certain category (student_class_p in code), P(Y|c) is the probability of a student’s answer correct in category c (all_class_p in code).
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as DINA:
# The probability that a student falls into a certain category
student_class_p = pm.Dirichlet("student_class_p",tt.ones(8),shape=(N,8))
s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))
# Probability of correct answer for all categories
all_c_p = pm.Deterministic("all_class_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_c_p))
pm.Bernoulli("Y",p_sample,observed=Y)
The posterior mean of student_class_p
After my attempt at marginalization, the categorization seemed to work poorly.The probability of students falling into different categories is very close
Higher-order DINA
In addition, I also tried Higher-Order DINA. Whether students master skills is determined by higher levels of ability.
The difference lies in that whether students master specific skills is determined by higher-order ability. The probability of mastering skill K is as above, and lambda0 and lambdak are intercept and slope respectively.
In the original version, the probability of mastering a particular skill had to be converted into a dichotomous variable (skill_p → skill) before the probability of a student answering correctly was calculated. In the current version, the probability of a student mastering a specific skill is converted into the probability of a student belonging to a certain category, and then the probability of a student answering correctly is calculated from the probability of a student belonging to a certain category.
all_c_eta = (all_class.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_c_eta = tt.as_tensor(all_c_eta)
with pm.Model() as HODINA:
# higher-order ability
theta_sample = pm.Normal("theta",0,1,shape=(N,1))
# intercept and slope
lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))
# The probability of mastering a particular skill
skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample+lambda_0_sample))
# Convert the probability that students belong to a
# certain skill into the probability that students belong to a certain category
student_class_p =pm.Deterministic("student_class_p",tt.exp(tt.log(skill_p_sample).dot(all_class.T)
+tt.log(1-skill_p_sample).dot(1-all_class.T)))
s_sample = pm.Beta("s",alpha=1,beta=1,shape=(1,I))
g_sample = pm.Beta("g",1,1,shape=(1,I))
all_p_p = pm.Deterministic("all_pattern_p",all_c_eta*(1-s_sample-g_sample)+g_sample)
p_sample = pm.Deterministic("p_sample",student_class_p.dot(all_p_p))
pm.Bernoulli("Y",p_sample,observed=Y)
However, the classification results of the model do not seem to be good. The probability of some students falling into different categories is close
The posterior mean of student_class_p
:
However, when I added residuals to SKILL_P, the classification results improved
with pm.Model() as HODINA:
# higher-order ability
theta_sample = pm.Normal("theta",0,1,shape=(N,1))
# intercept and slope
lambda_0_sample = pm.Normal("lambda0",0,1,shape=(1,K))
lambda_k_sample = pm.TruncatedNormal("lambdak",1,1,lower=0,shape=(1,K))
# Skill_p residuals
sig = pm.HalfNormal("sig",2)
epsilon = pm.Normal("epsilon",0,sig,shape=(N,K))
# The probability of mastering a particular skill
skill_p_sample = pm.Deterministic("skill_p",pm.math.sigmoid(lambda_k_sample*theta_sample
+lambda_0_sample+epsilon))
The posterior mean of student_class_p
:
And the residuals are very large
END
In short, I have 2 questions:
q1: How to correctly marginalize the potential discrete parameters of the DINA model?
q2: Why can residual item improve classification performance, and is it necessary to add a residual item when marginalizing? Are there any references to this?
Thanks for your time!
Any help would be appreciated!