Dealing With Missing Data

I’m working on a dataset that has a lot of missing data. Rather than dropping those rows with missing data, I’d like to estimate the missing data using a Bayesian approach.

I found two examples on how to do this:

  1. @fonnesbeck’s Intro Stat Modeling 2017 - Dealing with Missing Data
  2. Ruslan Salakhutdinov and Andriy Mnih’s Probabilistic Matrix Factorization for Making Personalized Recommendations

In the first approach, a predictive regression model is used for each column. In the second approach, a single factor model is used for everything.

It seems that the second approach is more efficient and straightforward, but perhaps the second approach is also more computationally intensive.

I was wondering what other people thought about the pros and cons of each approach and when would one would use one vs the other.

Thanks!

related discussion:

1 Like