Common reasons for getting a map estimate that is far from the mode of the posterior

I think the standard Bayesian approach would be to sample from the posterior and then calculate the MAE/MSE separately for each sample in your posterior. That would yield a distribution of MAE/MSE values. This propagates your uncertainty through the rest of your pipeline in the appropriate manner and allows you to make more meaning decisions at the end.

Yeah, this approach is not recommended for the reasons you mention (which are related to the explanation for the modes of the marginal distributions not matching the MAP).

Speed can definitely be a concern. You might consider something like variational inference if you feel like it would be appropriate for the models you are using (e.g., lots of normal posteriors). Alternatively, you can look into the new backends and samplers PyMC has recently made available (JAX, nutpie, etc.). This recent PyMCon Web Series event covered these in some detail.

Using the MAP can be ok, but the important thing to keep in mind is that doing so throws a tremendous amount of information away. You can fit 2 models to the same data set and calculate their MAPs and associated performance metrics. But one of those models may have assigned extremely high credibility to the MAP parameter values whereas the MAP of the other model may have been just as good (or bad) as any other set of parameter values. To put it succinctly: do you want a (likely) good answer slowly or a (potentially) incorrect answer quickly?