I believe I have solved my issue. In the spirit of good forum practice, to so whomever it may concern in future, here is the solution!:
import pandas as pd
import numpy as np
import pymc3 as pm
import theano.tensor as tt
df = pd.DataFrame([["alice","x",1],
["alice","y",1],
["bob","x",1],
["bob","y",0],
["charlie","x",1]
], columns=['user','question','correct'])
data = df.pivot(index='user', columns='question', values='correct')
obs = tt._shared(np.ma.masked_invalid(data.values))
with pm.Model() as model:
## Independent priors
alpha = pm.Normal('User', mu = 0, sigma = 3, shape = (1, len(data)))
gamma = pm.Normal('Task', mu = 0, sigma = 3, shape = (data.shape[1], 1))
## Log-Likelihood
def logp(obs):
rasch = tt.nnet.sigmoid(alpha - (gamma - gamma.mean(0)))
corrects = tt.switch(tt.isnan(obs), 0, obs)
incorrects = tt.switch(tt.isnan(obs), 0, (1-obs))
correct = tt.transpose(corrects) * tt.log(rasch)
incorrect = tt.transpose(incorrects) * tt.log(1 - rasch)
result = correct + incorrect
return result
ll = pm.DensityDist('ll', logp, observed = obs)
trace = pm.sample(cores=1)
trace = trace[250:]
I check for NaNs in my observations, and remove those from the computation. An important additional detail here is that I no longer pass the masked numpy array in. A masked numpy array is converted to a theano tensor with NaNs as 0, so they are lost. Instead I create the theano tensor using theano’s _shared, then switch on that to remove those from the computation.
Thank you very much ricardoV94 for helping me debug this 