OOS predictions with missing input values

junpenglao · October 26, 2017, 6:22am

There are not many ways to debug the error in sample_ppc. For example, in your case it is essentially doing y_modeled.random(trace[random_index]), which will return the same error message so it doesnt help.
I would probably just write my own OOS prediction code instead of building a new model, which is much faster and you also have better control:

nppc = 100
ppc = []
for idx in range(nppc):
    X_mask = np.random.multivariate_normal(means, cov, X_test.shape[0])
    X_mod = X_mask * X_test.mask + X_test.data * (1 - X_test.mask)
    point = trace[idx]
    b = point['beta']
    eps = point['sd']
    est = np.matmul(X_mod, b)
    y_pred = np.random.normal(est, eps)
    ppc.append(y_pred)

However, the above code is wrong in multivariate case. You should NOT use a i.i.d. multivariate drop-in to replace the missing value in X_test. In this case, the missing information is partially missing – for example, say one row in X is [ma., 0.5, 1.2] and X ~ MvNormal([0., 0., 0.], cov). Instead of generate X_ ~ MvNormal([0., 0., 0.], cov) and do ma. = X_[0], you need to work out the conditional probability of ma. by marginalizing the MvNormal using [.5, 1.2] and cov, and generate from this new conditional probability (also a Gaussian) instead.

Topic		Replies	Views
OOS predictions of covariate with "out-of-range" missing values Questions	6	643	August 12, 2024
Masking missing values of predictors Questions	3	1271	July 10, 2020
"Out of sample" predictions with the GLM sub-module Questions	9	2783	February 4, 2021
Logistic Regression w/ Missing Data? Questions	7	2848	September 11, 2017
Automatic imputation - array dimension problem Questions	2	647	February 10, 2022

OOS predictions with missing input values

Related topics