I am trying to model a collection of timeseries as a hierarchical model, where each individual timeseries has parameters with prior distributions, which in turn have common hyperprior distributions that are shared across different timeseries.
I noticed that when I fit the flat model on an individual timeseries, the fit is a lot better than what it is for that same timeseries, when it is fitted as part of the hierarchical model.
Is that expected behavior?
Also, what is the point of fitting a hierarchical model, as opposed to a collection of flat models? Is it more computationally efficient? Are there any other advantages? Thank you.
If you tend to have small amounts of data for each individual model, the parameter estimates you obtain from the hierarchical model can have much lower variance and/or less bias than a summary statistic of the per-individual parameter estimates. This can often translate into much better out-of-sample prediction performance. That said, it is can be very fast to simply fit a ton of models in parallel without hierarchical structure.
There’s some nice discussion on the merits of simply running lots of simple models here.
And on top of that, sometimes there is a reason for the hierarchy. It just may be that the data generating process demands it, like in the classical example of the radon contamination in the PyMC3 docs.