I’ll try to explain one of the reasons by comparing to what many machine learning algorithms call regularization. There are many situations in ML where there are a LOT of parameters. To prevent overfitting, and to help find the most relevant parameters and automatically discard the irrelevant ones, a regularization term is added to the training loss function. Some examples are the L1 and L2 regularizations that basically penalize big parameter values. This automatically let’s the fitting procedure move the irrelevant parameters to zero and only change them if they are truly important during training.
These added regularizations are the same as adding a Laplace or normal prior distribution instead of a flat prior. They automatically help in the regularization. The main difference between regularization and noon flat priors in my opinion is that in Bayesian inference you usually don’t look for the single most likely parameters (MAP) but sample from the full posterior distribution, so the results are different to standard ML.