Rejecting/thinning modes based on average density

I guess depends on what you plan to do. If you only care about minimizing prediction error, focusing on a single posterior mode does not necessary gives you bad performance (think of MLE or Laplace approximation, you essentially ignore all other smaller mode and use what hopefully the global maximum).