@junpenglao, it seems that with the current model (using unique masks for the data) it is impossible to generalize the distribution to conditionals that do not occur in the dataset. Intuitively, this isn’t accurate because the right way of looking at this is as a generative process and a truly Bayesian method would allow for sampling from the conditionals not seen in the dataset.
Is there a possibility of generalizing the existing model to unseen conditional distributions? Maybe a way to model X_missing as part of the generative process rather than have it drawn iid?