Bayesian neural networks and missing data

I’m trying to piece together a few different aspect’s I’ve become familiar with in PyMC3, but I am struggling conceptually to understand how they might work together. I understand how to account for missing data in covariates and how to structure a bayesian neural network (BNN) for a problem that actually requires it, but I can’t figure out how they can be used within the same model. Ideally what I am doing each time record is passed through the network is to have it account for the observed values and then account for the underlying variability in the missing values by interpreting it’s possible values from a latent distribution.

Conceptually I need to pass the matrix of covariates to the BNN, for which I then tune the weights between nodes data during sampling but the construction of that matrix with missing data seems to be an issue. To do it accurately I would want to represent each feature in the matrix with it’s own latent distribution so that values are properly generated for missing data during sampling but I run into a few issues.

Method 1: Create missingness in a single feature, create distribution associated with that feature, sample from it for missing values, then rejoin it with the complete observed data before passing it to the neural network. Here I get an issue with shape because each missing value is represented by it’s own sequence rather than a single value.

Method 2: For an individual record, create duplicates where the only difference between duplicates is the value of the missing feature. That missing value is generated from sampling the specified distribution. I knew this wouldn’t really work but wanted to see what the effect was to verify. Doing this you effectively teach the BNN to overemphasize the other observed characteristics on the single record leading to serious overfitting on the wrong parameters.

It’s entirely possible I am either missing something obvious or explaining this poorly. Please let me know if there is anything I can do to help clarify because I’d imagine this is likely of interest to quite a few people!