Dealing With Missing Data

alphamaximus · August 16, 2017, 2:07pm

I’m working on a dataset that has a lot of missing data. Rather than dropping those rows with missing data, I’d like to estimate the missing data using a Bayesian approach.

I found two examples on how to do this:

@fonnesbeck’s Intro Stat Modeling 2017 - Dealing with Missing Data
Ruslan Salakhutdinov and Andriy Mnih’s Probabilistic Matrix Factorization for Making Personalized Recommendations

In the first approach, a predictive regression model is used for each column. In the second approach, a single factor model is used for everything.

It seems that the second approach is more efficient and straightforward, but perhaps the second approach is also more computationally intensive.

I was wondering what other people thought about the pros and cons of each approach and when would one would use one vs the other.

Thanks!

junpenglao · August 18, 2017, 9:20am

related discussion:

Topic		Replies	Views
Large Scale Factor Analysis with minibatch ADVI Questions	9	2940	August 20, 2017
Could Someone Give me Advice for Handling Missing Data in Bayesian Modeling with PyMC? v5 theano , modeling	1	110	January 31, 2025
Masking missing values of predictors Questions	3	1330	July 10, 2020
Marginalizing over missing categories Questions	1	709	June 17, 2020
Dealing with random missing values in a GLM model v5 modeling	0	302	July 18, 2023

Dealing With Missing Data

Related topics