I’m working on a dataset that has a lot of missing data. Rather than dropping those rows with missing data, I’d like to estimate the missing data using a Bayesian approach.
I found two examples on how to do this:
- @fonnesbeck’s Intro Stat Modeling 2017 - Dealing with Missing Data
- Ruslan Salakhutdinov and Andriy Mnih’s Probabilistic Matrix Factorization for Making Personalized Recommendations
In the first approach, a predictive regression model is used for each column. In the second approach, a single factor model is used for everything.
It seems that the second approach is more efficient and straightforward, but perhaps the second approach is also more computationally intensive.
I was wondering what other people thought about the pros and cons of each approach and when would one would use one vs the other.