I have the following model, where the feature matrix
X is of shape NxK and some of the regressors contain missing values.
with pm.Model(coords=coords) as rnd_icpt_mdl: X_imp = np.nan_to_num(x=X, nan=-999) masked_values = np.ma.masked_array(X_imp, mask=X_imp == -999) beta = pm.Normal("beta", mu=0.0, sigma=1.0, shape=n_vars) # Slopes - same for all groups X = pm.Normal("X", mu=0.0, sigma=1.0, observed=masked_values) # Imputed X values eps = pm.InverseGamma(name="eps", alpha=9.0, beta=4.0) # Model error y_hat = pm.math.dot(X, K) # Model prediction y_like = pm.Normal("y_like", y_hat, sigma=eps, observed=data["DepVar"], dims="obs_id") # Data likelihood
I impute the missing values using a masked array.
However, there are some obscurities.
What is the impact of the distribution (here Normal) on my feature matrix? Are only the imputed/missing values drawn from a normal distribution, or does the whole feature matrix X become normally distributed? Based on what should I choose the distribution and its parameter?
Do I have to pass a shape parameter to the imputation statement? If yes, what would be the shape? Something like this?
X = pm.Normal("X", mu=0.0, sigma=1.0, observed=masked_values, shape=K) # Imputed X values
Or is the shape equal to the shape of the feature matrix, i.e. shape=(NxK) ?