Thinking about it a bit more, I’m not sure it makes sense to do a huge battery of model comparisons – the search space is too big. If you have information sharing between time series (i.e. hierarchical parameters), you are going to need a quasi-automatic way to pick the lag order. I’d recommend at the very least putting Laplace priors on the lag coefficients.
I also don’t think you need to ever contemplate more lags than the seasonal order of the model. For example if you have weekly data, the max lag i’d consider is 4, otherwise I’d consider it monthly seasonality and handle it with fourier bases or something. Daily data I’d consider up to 7, monthly up to 12, etc.
tldr: treat autoregressive terms are like a garbage can for unexplainable, short-term effects. If you can explicitly model other structural elements (like seasonality, trend, cycle, etc), do that instead of relying on autoregression.