I’m a PyMC3 newbie. I’m trying to create a Bayesian Rasch model, but I don’t have all the observations (i.e. an observation for every user, for every question). I understand that ordinarily, for missing observations PyMC3 automatically performs imputation by sampling from the prior. In my case however, I suspect as a consequence of my custom distribution, this doesn’t seem to be working.
In the example below I am deliberately missing an observation for charlie on item y . Removing charlie , everything works beautifully.
import pandas as pd
import numpy as np
import pymc3 as pm
import theano.tensor as tt
df = pd.DataFrame([["alice","x",1],
["alice","y",1],
["bob","x",1],
["bob","y",0],
["charlie","x",1]], columns=['user','question','correct'])
data = df.pivot(index='user', columns='question', values='correct')
observations = np.ma.masked_invalid(data.values)
with pm.Model() as model:
## Independent priors
alpha = pm.Normal('User', mu = 0, sigma = 3, shape = (1, len(data)))
gamma = pm.Normal('Task', mu = 0, sigma = 3, shape = (data.shape[1], 1))
## Log-Likelihood
def logp(d):
rasch = tt.nnet.sigmoid(alpha - (gamma - gamma.mean(0)))
#P(x=1)
v1 = tt.transpose(d) * tt.log(rasch)
#P(x=0)
v2 = tt.transpose((1-d)) * tt.log(1 - rasch)
return v1 + v2
ll = pm.DensityDist('ll', logp, observed = {'d': observations})
trace = pm.sample(cores=1)
trace = trace[250:]
What’s the correct way of solving this?