Missing Data Imputation - Obscurities

michnag · January 18, 2022, 10:44am

I have the following model, where the feature matrix X is of shape NxK and some of the regressors contain missing values.

with pm.Model(coords=coords) as rnd_icpt_mdl:
   
    X_imp = np.nan_to_num(x=X, nan=-999)
    masked_values = np.ma.masked_array(X_imp, mask=X_imp == -999)
    
    beta = pm.Normal("beta", mu=0.0, sigma=1.0, shape=n_vars)  # Slopes - same for all groups

    X = pm.Normal("X", mu=0.0, sigma=1.0, observed=masked_values)  # Imputed X values

    eps = pm.InverseGamma(name="eps", alpha=9.0, beta=4.0)  # Model error

    y_hat = pm.math.dot(X, K)  # Model prediction

    y_like = pm.Normal("y_like", y_hat, sigma=eps, observed=data["DepVar"], dims="obs_id")  # Data likelihood

I impute the missing values using a masked array.
However, there are some obscurities.

What is the impact of the distribution (here Normal) on my feature matrix? Are only the imputed/missing values drawn from a normal distribution, or does the whole feature matrix X become normally distributed? Based on what should I choose the distribution and its parameter?
Do I have to pass a shape parameter to the imputation statement? If yes, what would be the shape? Something like this?

 X = pm.Normal("X", mu=0.0, sigma=1.0, observed=masked_values, shape=K)  # Imputed X values

Or is the shape equal to the shape of the feature matrix, i.e. shape=(NxK) ?

Topic		Replies	Views
Masking missing values of predictors Questions	3	1335	July 10, 2020
Multivariate normal with missing data imputation operands could not be broadcast together with shapes (29,2) () (29,) Questions theano	12	1859	September 7, 2020
Automatic imputation - array dimension problem Questions	2	669	February 10, 2022
Logistic Regression w/ Missing Data? Questions	7	2865	September 11, 2017
Impute results in mismatch dimensions in dims and data v5	6	1517	September 28, 2022

Missing Data Imputation - Obscurities

Related topics