Common reasons for getting a map estimate that is far from the mode of the posterior

The primary reason for my keen interest in the map-approach stems from two distinct advantages. Firstly, it offers significant speed enhancements. Secondly, it possesses what I refer to as “predictive power” (the explanation for which I shall elaborate on later).

As part of a larger system, I run a model-selection task evaluating around 100-200 models with the ultimate aim of capturing the effects of specific inputs on the response, hence accurately identifying parameter-values. With certain datasets, the runtime required for specific models with default sampling-settings (1k tune, 1k samples, 4 cores) when sampling from the posterior may take up to 20-40 minutes, whereas with map, this can be accomplished within 1-10 seconds, allowing me to run 100-200 more iterations during the model-selection.

Moreover, the metric utilized for model-selection is a rolling window cross-validation with mean absolute error (MAE) or mean squared error (MSE) as a metric and a lookahead of 10-50 days. The reason behind this is that the model must act as a predictor for a lookahead of 10-50 days in a downstream task. However, what troubles me about sampling from the full-posterior and then calculating the mean or mode of each parameter(this seems suboptimal due to independence assumptions, i assume calculating the centre of gravity of this multidimensional space would be better?) is that it usually yields inferior estimates in terms of the metric presented.

While I have contemplated turning to Bayesian model selection techniques such as Bayesian information criterion (BIC) or leave-one-out (LOO), I feel that the presented metric is more aligned with the ultimate goal. Nonetheless, I remain open to changing my perspective on this matter.

Also i would like to add, i highly appreciate your effort of teaching us beginners, its great to have such an community as this one. Ty.