This makes perfect sense and I agree that it is better to fit a trend term instead of relying on something that could be just random noise. If I’m understanding correctly, the idea of adding a drift to the GRW would be to add a term drift_innovations to rw_innovations that is normally distributed with a small sigma (code below).
# need the cumsum parametrization to properly control the init of the GRW
rw_init = aet.zeros(shape=(len(COORDS["president"]), 1))
drift_innovations = pm.Normal(
"drift_innovations",
sigma=0.15,
shape=(len(COORDS["president"]), 1)
)
rw_innovations = pm.Normal(
"rw_innovations",
dims=("president", "month_minus_origin"),
)
raw_rw = aet.cumsum(aet.concatenate([rw_init, drift_innovations + rw_innovations], axis=-1), axis=-1)
sd = pm.HalfNormal("shrinkage_pop", 0.2)
month_president_effect = pm.Deterministic(
"month_president_effect", raw_rw * sd, dims=("president", "month")
)
So then, after training, I could extrapolate by sampling from a GRW like this:
raw_rw = pm.GaussianRandomWalk("raw_rw", steps=horizon)
trend_extrapolation = drift_innovations + raw_rw * sd
As you said, this would be equivalent to adding a deterministic time trend but I like this approach better as the trend is nicely incorporated within the RW. The part I’m failing to understand is the other approach about modeling the mean as another GRW. What would be my time dimension in this second RW? And wouldn’t I need to extrapolate it somehow as well?