Dear PyMC3 Community,
I am looking for someone that worked with missing data for both predictors (Xi) and y_obs.
My understanding is that there is no need to do the imputation beforehand, e.g. as part of a preprocessing data analysis pipeline. Hence, this can be written in a Bayesian way directly.
I would like to model a Bernoulli classification based on X1 and X2 that contain missing values.
If I excluded the missing value, I could run the model but if I want to keep the missing values, I get into bugs.
I would highly appreciate any advice in regards to this.
Please find below the script.
Thank you very much in advance
x_missing = np.isnan(x_train)
X_train = np.ma.masked_array(x_train, mask=x_missing)
#y_train.shape is (97, 1)
#X_train.shape is (97, 2)
X_shape = len(x_missing)
with pm.Model() as model:
#Define priors
beta = pm.Normal ('beta', 0, 10)
#Imputation of X missing values
Xmu = pm.Normal('Xmu', 0, 1, shape=X_shape)
X_modeled = pm.Normal('X', mu=Xmu, sd=10, observed=X_train)
#Define likelihood
lp = pm.Deterministic('lp', pm.math.dot(X_modeled, beta))
#Define posterior
y_obs = pm.Bernoulli('y_obs', p=lp, observed=y_train)
#Inference
trace = pm.sample()