If you tend to have small amounts of data for each individual model, the parameter estimates you obtain from the hierarchical model can have much lower variance and/or less bias than a summary statistic of the per-individual parameter estimates. This can often translate into much better out-of-sample prediction performance. That said, it is can be very fast to simply fit a ton of models in parallel without hierarchical structure.
There’s some nice discussion on the merits of simply running lots of simple models here.