Exam data - inference of student performance

RCHA · April 28, 2020, 8:22pm

Super - thanks. Your post made me realise that I wasn’t passing test_value the sampler. This meant it was starting in a region of zero probability (logp = -inf), so the sampler breaks, naturally.

This helpful post mentioned using a test_value for a binomial distribution.

So the code works, now, provided there are no missing values in the data. However, if I make the observed data n a masked array, then the code breaks:

File “main.py”, line 43, in
nH_obs = n - nE ## Observed number of hard questions answered correctly, given nE
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/tensor/var.py”, line 230, in rsub
return theano.tensor.basic.sub(other, self)
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/gof/op.py”, line 615, in call
node = self.make_node(*inputs, **kwargs)
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/tensor/elemwise.py”, line 480, in make_node
inputs = list(map(as_tensor_variable, inputs))
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/tensor/basic.py”, line 194, in as_tensor_variable
return constant(x, name=name, ndim=ndim)
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/tensor/basic.py”, line 232, in constant
x_ = scal.convert(x, dtype=dtype)
File “/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/theano/scalar/basic.py”, line 284, in convert
assert type(x_) in [np.ndarray, np.memmap]
AssertionError

It looks like I am not allowed to subtract a pymc3.model.FreeRV object from a masked numpy array (but a normal ndarray is fine); do you know if there is a reason for this?

Here’s the full code:

data = pd.read_csv( ‘test_data.csv’, header = [0,1], index_col = 0 )
data.fillna(-1, inplace=True)
data = data.astype( ‘int32’ )
print(data)
S = len(data.index) ## Number of students
T = len(data.columns) ## Number of tests
N = np.array( data.columns.get_level_values(1).values, dtype = int ) ## Max available marks for each test
n = data.to_numpy()
n = np.ma.masked_equal(n, value=-1)

with pm.Model() as model:

    H = pm.DiscreteUniform('H', lower = np.zeros(T, dtype = int), upper = N, shape = T, testval = N//2 )
    E = N - H
    pE = pm.Uniform('pE', lower = 0., upper = 1., shape=S)
    pH = pm.Uniform('pH', lower = 0., upper = pE, shape=S)
    nE = pm.Binomial('nE', n=E[None,:], p=pE[:,None], shape = (S, T), testval = n//2 )
    nH_obs = n - nE  ## Observed number of hard questions answered correctly, given nE
    nH = pm.Binomial('nH', n=H[None,:], p=pH[:,None], shape = (S, T), observed = nH_obs )
    trace = pm.sample(5000, tune = 10000, discard_tuned_samples = True )

Many thanks for taking the time to look at this.

Topic		Replies	Views
Dealing with missing data and custom distribution Questions	13	2182	March 14, 2021
Starting Point failure v3	5	454	May 30, 2022
Negative Binomial Distribution with logp = -inf Questions	14	1488	July 11, 2018
Missing values in a model? Questions	12	4729	November 7, 2018
Disabling missing data imputation Questions	17	2200	October 10, 2023

Exam data - inference of student performance

Related topics