Forecasting using Gaussian Random Walk

I recently found @AlexAndorra 's excellent blog post where he models popularity of French presidents with a Gaussian random walk. The model he used looks like this (for more context refer to the aforementioned blog post link):

with pm.Model(coords=COORDS) as hierarchical_popularity:

baseline = pm.Normal("baseline")
president_effect = ZeroSumNormal("president_effect", sigma=0.15, dims="president")
house_effect = ZeroSumNormal("house_effect", sigma=0.15, dims="pollster_by_method")
month_effect = ZeroSumNormal("month_effect", sigma=0.15, dims="month")

# need the cumsum parametrization to properly control the init of the GRW
rw_init = aet.zeros(shape=(len(COORDS["president"]), 1))
rw_innovations = pm.Normal(
"rw_innovations",
dims=("president", "month_minus_origin"),
)
raw_rw = aet.cumsum(aet.concatenate([rw_init, rw_innovations], axis=-1), axis=-1)
sd = pm.HalfNormal("shrinkage_pop", 0.2)
month_president_effect = pm.Deterministic(
"month_president_effect", raw_rw * sd, dims=("president", "month")
)

popularity = pm.math.invlogit(
baseline
+ president_effect[president_id]
+ month_effect[month_id]
+ month_president_effect[president_id, month_id]
+ house_effect[pollster_by_method_id]
)

# overdispersion parameter
theta = pm.Exponential("theta_offset", 1.0) + 10.0

N_approve = pm.BetaBinomial(
"N_approve",
alpha=popularity * theta,
beta=(1.0 - popularity) * theta,
n=data["samplesize"],
observed=data["num_approve"],
dims="observation",
)


Iâ€™m curious to know if it is possible to extend this as a forecasting problem:

• Suppose the presidential duration goes beyond 60 months, how can you continue the forecasts for consecutive months beyond 60? Once the model has been trained, is there a way to keep pushing the Gaussian walk month_president_effect  and the month_effect components into the following months?
• And related to my previous question, will this forecast continue the trend from the last months, or does a new term need to be added to the model to capture the trend, e.g. \beta_0 * time?

Overall, I was thinking of taking the month_president_effect  and the month_effect components, computing the one-step difference between elements and sample from them to generate â€ścorrectionsâ€ť that could be added up (cumsum) and appended to the last value of those components. Not sure if this sounds reasonable.

Looking forward to ideas and discussion on this topic.

I wrote a post about the statistics of the GRW model and how it should forecast here. Applying it in your case would be a bit more complex. You would use the procedure I outline in that post to construct that latent GRW process for t+h forecast steps of raw_rw, then combine it with all the other time-invariant parts of your model posterior to generate forecasts.

Donâ€™t expect too much from the GRW out of sample.

Thanks for this, Jesse! This is very insightful. Following-up on this, does it make any sense my comment on computing the one-step difference between consecutive elements of the GRW? In my head, I see these differences as a way to capture the overall trend and trend changes within historical data. So if I extract them and sample from them to mimic a similar behavior into the future, I should be able to keep pushing the same trend observed in the past. Not sure if this sounds reasonable and it it has any theoretical / practical support.

Iâ€™ve never heard of it done the way you propose, but I donâ€™t see why you couldnâ€™t. Everything is worth a try imo. My only concern is that youâ€™re reading a dynamic into the model that the model didnâ€™t actually fit, and thus it could just be random noise. Whatever the difference of the last two posterior values of the GRW happen to be will lead to a deterministic trend, but the model doesnâ€™t â€śknowâ€ť anything about this trend term so itâ€™s questionable whether you have evidence that it exists. In addition, since you donâ€™t model the mean of the GRW, I think the differences will always be mean zero, but itâ€™s late and I didnâ€™t do anything to verify this intuition so I could be wrong.

One note, though, is that models that include a trend which extrapolates out from last observed trend tend to forecast poorly, and many time series models introduce a â€śdampened trendâ€ť to correct for this. See the discussion here. This is close to what you are proposing, so I also think you should be cautious about naively extrapolating a trend from the GRW posterior for this reason. (Iâ€™m not suggesting you should worry about a dampened trend in your model, just pointing out that trend extrapolation isnâ€™t so simple)

If I were going to add a trend to your model, I would either include a deterministic time trend in the model (this is what you suggest in the 2nd point), or explicitly model the mean of the rw_innovations (I guess people call that a â€śdriftâ€ť, though if you do the algebra it comes to the same thing; i call this \mu in the post I linked). The mean itself could also be a GRW, then youâ€™d get something like a â€ślocal levelâ€ť model where you estimate both the â€śspeedâ€ť and â€śvelocityâ€ť of the process. I think this is close to what you are thinking about with the â€ścorrectionsâ€ť approach.

With either of these approaches you donâ€™t need to worry about any complicated post-processing. The trend will just fall out of the estimation and can be used to forecast.

1 Like

This makes perfect sense and I agree that it is better to fit a trend term instead of relying on something that could be just random noise. If Iâ€™m understanding correctly, the idea of adding a drift to the GRW would be to add a term drift_innovations  to rw_innovations  that is normally distributed with a small sigma (code below).

# need the cumsum parametrization to properly control the init of the GRW
rw_init = aet.zeros(shape=(len(COORDS["president"]), 1))
drift_innovations = pm.Normal(
"drift_innovations",
sigma=0.15,
shape=(len(COORDS["president"]), 1)
)
rw_innovations = pm.Normal(
"rw_innovations",
dims=("president", "month_minus_origin"),
)
raw_rw = aet.cumsum(aet.concatenate([rw_init, drift_innovations + rw_innovations], axis=-1), axis=-1)
sd = pm.HalfNormal("shrinkage_pop", 0.2)
month_president_effect = pm.Deterministic(
"month_president_effect", raw_rw * sd, dims=("president", "month")
)


So then, after training, I could extrapolate by sampling from a GRW like this:

raw_rw = pm.GaussianRandomWalk("raw_rw", steps=horizon)
trend_extrapolation = drift_innovations + raw_rw * sd


As you said, this would be equivalent to adding a deterministic time trend but I like this approach better as the trend is nicely incorporated within the RW. The part Iâ€™m failing to understand is the other approach about modeling the mean as another GRW. What would be my time dimension in this second RW? And wouldnâ€™t I need to extrapolate it somehow as well?

I imagined it would work something like this:

with pm.Model() as model:

sigma_level = pm.HalfNormal('sigma_level', sigma=1.0)
sigma_trend = pm.Gamma('sigma_trend', alpha=1, beta=100)
sigma_obs = pm.HalfNormal('sigma_obs', 1.0)

trend = pm.Normal('trend', mu=0, sigma=sigma_trend, shape=n_obs).cumsum()

# No need for a time shape here. It will inherit the shape of the mean
mu = pm.Normal('mu', mu=trend, sigma=sigma_level).cumsum()

obs = pm.Normal('obs', mu=mu, sigma=sigma_obs, observed=nile.height)

llevel_trace = pm.sample(target_accept=0.95, init='jitter+adapt_diag_grad')


I tested it and it doesnâ€™t work very well. Iâ€™d just stick to exactly what you did, adding a non-zero mean to the GRW to induce a drift.

1 Like

Thanks Jesse! This is super helpful and worth giving it a try. Just adding a non-zero mean to the GRW to induce a drift seems to be working, but Iâ€™m curious to try the approach you outlined. Thanks again.

1 Like