Hello there,
I am new to Bayesian modeling and have been experimenting with PyMC for a few months now. First of all; I would like to say thank you to everyone here for the wealth of knowledge available on this forum it is been invaluable in getting me started!
I am working on a dataset with a significant amount of missing values, and I am unsure how to approach the problem in the context of Bayesian modeling. The dataset involves a mix of continuous and categorical variables; and the missing data appears to follow some non random patterns.
I have read that PyMC allows for the modeling of missing data directly as part of the Bayesian inference process. Can anyone share examples or resources that show how this is implemented effectively, particularly for MNAR scenarios?
Would you recommend using PyMC to model the missing data directly or preprocessing the data with imputation techniques like multiple imputation before using it in the Bayesian model?
Also, I have gone through this post; https://discourse.pymc.io/t/dealing-with-missing-data-and-custom-distribution-salesforce-commerce-cloud which definitely helped me out a lot.
Are there any common pitfalls to avoid when handling missing data in PyMC? For example, I have noticed that including too many predictors for imputation can sometimes slow down the convergence.
Thanks in advance for your help and assistance.