IRT w/ PyMC3, unusual number of parameters modeled?

I’m relatively new to PyMC3, trying to setup a synthetic IRT model. The below code does sample, decent rhat, very low effective samples:

k = np.array([1,1,1,1,1,
              1,1,1,1,0,
              1,1,1,0,0,
              1,1,0,0,0,
              1,0,0,0,0]).reshape(5,5)

students = 5
questions = 5

with pm.Model() as model:
    # student multilevel prior
    sigma_student = pm.HalfNormal('sigma_student',sigma=1,shape=1)
    mu_student = pm.Normal('mu_student',mu=0,sigma=1,shape=1)
    z_student = pm.Normal("z_student", mu=mu_student, sigma=sigma_student, shape=students)

    # question single-level prior
    sigma_question = pm.HalfNormal('sigma_question',sigma=1,shape=1)
    z_question = pm.Normal("z_question",mu=0, sigma=sigma_question, shape=students)
    
    # Transformed parameter
    theta = pm.Deterministic("theta", pm.math.sigmoid(z_student - z_question))
     
    # Likelihood
    kij = pm.Bernoulli("kij", p=theta, observed=k)
    trace = pm.sample(chains=4)

az.plot_trace(trace, var_names=["z_student", "z_question"], compact=False);

Now, when I analyze results, I see:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
mu_student[0]	0.351	0.611	-0.762	1.527	0.019	0.013	1058.0	1058.0	1054.0	1891.0	1.00
z_student[0]	1.160	1.146	-0.759	3.289	0.050	0.035	533.0	533.0	557.0	959.0	1.01
z_student[1]	0.775	0.940	-0.834	2.697	0.035	0.025	706.0	684.0	745.0	1752.0	1.00
z_student[2]	0.357	0.835	-1.281	1.882	0.022	0.016	1380.0	1380.0	1333.0	2247.0	1.00
z_student[3]	0.103	0.892	-1.629	1.694	0.026	0.019	1142.0	1142.0	1079.0	1953.0	1.01
z_student[4]	-0.255	0.980	-2.102	1.533	0.032	0.023	937.0	937.0	858.0	1664.0	1.01
z_question[0]	-0.694	1.006	-2.767	0.785	0.041	0.029	592.0	592.0	800.0	1001.0	1.01
z_question[1]	-0.306	0.779	-1.945	1.002	0.027	0.022	826.0	647.0	967.0	1053.0	1.01
z_question[2]	-0.027	0.766	-1.403	1.647	0.019	0.020	1712.0	771.0	1670.0	1363.0	1.01
z_question[3]	0.231	0.763	-1.043	1.955	0.020	0.017	1492.0	1005.0	1450.0	1340.0	1.01
z_question[4]	0.483	0.862	-0.944	2.230	0.030	0.021	833.0	833.0	922.0	1568.0	1.01
sigma_student[0]	0.919	0.535	0.164	1.877	0.025	0.018	463.0	463.0	284.0	130.0	1.01
sigma_question[0]	0.840	0.519	0.101	1.740	0.030	0.021	294.0	294.0	239.0	242.0	1.02
theta[0]	0.816	0.138	0.570	0.999	0.005	0.004	720.0	709.0	795.0	1576.0	1.01
theta[1]	0.716	0.152	0.442	0.972	0.004	0.003	1484.0	1317.0	1351.0	1966.0	1.00
theta[2]	0.584	0.168	0.269	0.887	0.004	0.003	1634.0	1519.0	1659.0	2127.0	1.01
theta[3]	0.474	0.179	0.159	0.815	0.006	0.005	940.0	755.0	973.0	383.0	1.00
theta[4]	0.350	0.177	0.050	0.676	0.006	0.004	815.0	782.0	892.0	1114.0	1.01

What troubles me is that there are 5 learned values for theta. Personally, I wasn’t interested in theta at all, except as an in between step. However, there should be 25 values, one for every possible student/question combination. The fact that there are 5 makes me worried that this code isn’t doing what I expected.

When I try to write the computation in terms of a for loop I get an error. ValueError: Variable name kij already exists. Nonetheless, it perhaps better illustrates what I want:

students = 5
questions = 5

with pm.Model() as model1:
    # student multilevel prior
    sigma_student = pm.HalfNormal('sigma_student',sigma=1,shape=1)
    mu_student = pm.Normal('mu_student',mu=0,sigma=1,shape=1)
    z_student = pm.Normal("z_student", mu=mu_student, sigma=sigma_student, shape=students)

    # question single-level prior
    sigma_question = pm.HalfNormal('sigma_question',sigma=1,shape=1)
    z_question = pm.Normal("z_question",mu=0, sigma=sigma_question, shape=students)
    
    # Likelihood
    for i in range(students):
      for j in range(questions):
        si = z_student[i]
        qj = z_question[j]
        response = pm.Deterministic("response", pm.math.sigmoid(si - qj))
        kij *= pm.Bernoulli("kij", p=response, observed=k[i,j])
    posterior = pm.sample(chains=4)

az.plot_trace(posterior, var_names=["z_student", "z_question"], compact=False);

Could someone help me clear up

  1. Why is theta a vector of length 5 and what does this represent?
  2. Why I cannot update the likelihood iteratively in a for loop (or if I can, how?)

As for the first question, the shape of theta is 5 because both z_student and z_question have shape of (5,). To get a value for each combination, you need to either instantiate variables with a shape of (25,) (which I think is probably not what you want) or you need to instantiate them with shapes that can be broadcast. If one has a shape of (1,5) and the other has a shape of (5,1), the difference will be broadcast to (5,5), giving you the 25 values.

1 Like

Nice, this worked out exactly. I would never have guessed this; from what I recall from linear algebra, column vectors cannot be subtracted from row vectors (and vice versa.)

Could you link a resource on “broadcasting”? Seems like something to master early on for good PyMC3 modeling.

I would agree that broadcasting is useful - IMO it is one of the most powerful features for constructing sophisticated models and you are right in that it is quite distinct from the usual rules of linear algebra. It is a little challenging to track down the information on broadcasting because PyMC3 relies on Theano’s broadcasting rules which are themselves reproduced from Numpy. For many use cases, if you can broadcast two variables per the rules for Numpy, you can do the same thing for Theano. See here and here for more info.

1 Like

Oh, awesome- thank you! Btw, since you seem to be quite familiar, you may have thoughts on an extension of this topic: