Dear PyMC3 Community,
I am looking for someone that worked with missing data for both predictors (Xi) and y_obs.
My understanding is that there is no need to do the imputation beforehand, e.g. as part of a preprocessing data analysis pipeline. Hence, this can be written in a Bayesian way directly.
I would like to model a Bernoulli classification based on X1 and X2 that contain missing values.
If I excluded the missing value, I could run the model but if I want to keep the missing values, I get into bugs.
I would highly appreciate any advice in regards to this.
Please find below the script.
Thank you very much in advance
x_missing = np.isnan(x_train) X_train = np.ma.masked_array(x_train, mask=x_missing) #y_train.shape is (97, 1) #X_train.shape is (97, 2) X_shape = len(x_missing) with pm.Model() as model: #Define priors beta = pm.Normal ('beta', 0, 10) #Imputation of X missing values Xmu = pm.Normal('Xmu', 0, 1, shape=X_shape) X_modeled = pm.Normal('X', mu=Xmu, sd=10, observed=X_train) #Define likelihood lp = pm.Deterministic('lp', pm.math.dot(X_modeled, beta)) #Define posterior y_obs = pm.Bernoulli('y_obs', p=lp, observed=y_train) #Inference trace = pm.sample()