Forecasting using Gaussian Random Walk

I recently found @AlexAndorra 's excellent blog post where he models popularity of French presidents with a Gaussian random walk. The model he used looks like this (for more context refer to the aforementioned blog post link):

with pm.Model(coords=COORDS) as hierarchical_popularity:

    baseline = pm.Normal("baseline")
    president_effect = ZeroSumNormal("president_effect", sigma=0.15, dims="president")
    house_effect = ZeroSumNormal("house_effect", sigma=0.15, dims="pollster_by_method")
    month_effect = ZeroSumNormal("month_effect", sigma=0.15, dims="month")

    # need the cumsum parametrization to properly control the init of the GRW
    rw_init = aet.zeros(shape=(len(COORDS["president"]), 1))
    rw_innovations = pm.Normal(
        "rw_innovations",
        dims=("president", "month_minus_origin"),
    )
    raw_rw = aet.cumsum(aet.concatenate([rw_init, rw_innovations], axis=-1), axis=-1)
    sd = pm.HalfNormal("shrinkage_pop", 0.2)
    month_president_effect = pm.Deterministic(
        "month_president_effect", raw_rw * sd, dims=("president", "month")
    )

    popularity = pm.math.invlogit(
        baseline
        + president_effect[president_id]
        + month_effect[month_id]
        + month_president_effect[president_id, month_id]
        + house_effect[pollster_by_method_id]
    )

    # overdispersion parameter
    theta = pm.Exponential("theta_offset", 1.0) + 10.0

    N_approve = pm.BetaBinomial(
        "N_approve",
        alpha=popularity * theta,
        beta=(1.0 - popularity) * theta,
        n=data["samplesize"],
        observed=data["num_approve"],
        dims="observation",
    )

I’m curious to know if it is possible to extend this as a forecasting problem:

  • Suppose the presidential duration goes beyond 60 months, how can you continue the forecasts for consecutive months beyond 60? Once the model has been trained, is there a way to keep pushing the Gaussian walk month_president_effect and the month_effect components into the following months?
  • And related to my previous question, will this forecast continue the trend from the last months, or does a new term need to be added to the model to capture the trend, e.g. \beta_0 * time?

Overall, I was thinking of taking the month_president_effect and the month_effect components, computing the one-step difference between elements and sample from them to generate “corrections” that could be added up (cumsum) and appended to the last value of those components. Not sure if this sounds reasonable.

Looking forward to ideas and discussion on this topic.

I wrote a post about the statistics of the GRW model and how it should forecast here. Applying it in your case would be a bit more complex. You would use the procedure I outline in that post to construct that latent GRW process for t+h forecast steps of raw_rw, then combine it with all the other time-invariant parts of your model posterior to generate forecasts.

Don’t expect too much from the GRW out of sample.

Thanks for this, Jesse! This is very insightful. Following-up on this, does it make any sense my comment on computing the one-step difference between consecutive elements of the GRW? In my head, I see these differences as a way to capture the overall trend and trend changes within historical data. So if I extract them and sample from them to mimic a similar behavior into the future, I should be able to keep pushing the same trend observed in the past. Not sure if this sounds reasonable and it it has any theoretical / practical support.

I’ve never heard of it done the way you propose, but I don’t see why you couldn’t. Everything is worth a try imo. My only concern is that you’re reading a dynamic into the model that the model didn’t actually fit, and thus it could just be random noise. Whatever the difference of the last two posterior values of the GRW happen to be will lead to a deterministic trend, but the model doesn’t “know” anything about this trend term so it’s questionable whether you have evidence that it exists. In addition, since you don’t model the mean of the GRW, I think the differences will always be mean zero, but it’s late and I didn’t do anything to verify this intuition so I could be wrong.

One note, though, is that models that include a trend which extrapolates out from last observed trend tend to forecast poorly, and many time series models introduce a “dampened trend” to correct for this. See the discussion here. This is close to what you are proposing, so I also think you should be cautious about naively extrapolating a trend from the GRW posterior for this reason. (I’m not suggesting you should worry about a dampened trend in your model, just pointing out that trend extrapolation isn’t so simple)

If I were going to add a trend to your model, I would either include a deterministic time trend in the model (this is what you suggest in the 2nd point), or explicitly model the mean of the rw_innovations (I guess people call that a “drift”, though if you do the algebra it comes to the same thing; i call this \mu in the post I linked). The mean itself could also be a GRW, then you’d get something like a “local level” model where you estimate both the “speed” and “velocity” of the process. I think this is close to what you are thinking about with the “corrections” approach.

With either of these approaches you don’t need to worry about any complicated post-processing. The trend will just fall out of the estimation and can be used to forecast.

1 Like

This makes perfect sense and I agree that it is better to fit a trend term instead of relying on something that could be just random noise. If I’m understanding correctly, the idea of adding a drift to the GRW would be to add a term drift_innovations to rw_innovations that is normally distributed with a small sigma (code below).

# need the cumsum parametrization to properly control the init of the GRW
rw_init = aet.zeros(shape=(len(COORDS["president"]), 1))
drift_innovations = pm.Normal(
    "drift_innovations",
    sigma=0.15,
    shape=(len(COORDS["president"]), 1)
)
rw_innovations = pm.Normal(
    "rw_innovations",
    dims=("president", "month_minus_origin"),
)
raw_rw = aet.cumsum(aet.concatenate([rw_init, drift_innovations + rw_innovations], axis=-1), axis=-1)
sd = pm.HalfNormal("shrinkage_pop", 0.2)
month_president_effect = pm.Deterministic(
    "month_president_effect", raw_rw * sd, dims=("president", "month")
)

So then, after training, I could extrapolate by sampling from a GRW like this:

raw_rw = pm.GaussianRandomWalk("raw_rw", steps=horizon)
trend_extrapolation = drift_innovations + raw_rw * sd

As you said, this would be equivalent to adding a deterministic time trend but I like this approach better as the trend is nicely incorporated within the RW. The part I’m failing to understand is the other approach about modeling the mean as another GRW. What would be my time dimension in this second RW? And wouldn’t I need to extrapolate it somehow as well?

I imagined it would work something like this:

with pm.Model() as model:
        
    sigma_level = pm.HalfNormal('sigma_level', sigma=1.0)
    sigma_trend = pm.Gamma('sigma_trend', alpha=1, beta=100)
    sigma_obs = pm.HalfNormal('sigma_obs', 1.0)

    trend = pm.Normal('trend', mu=0, sigma=sigma_trend, shape=n_obs).cumsum()

    # No need for a time shape here. It will inherit the shape of the mean
    mu = pm.Normal('mu', mu=trend, sigma=sigma_level).cumsum()

    obs = pm.Normal('obs', mu=mu, sigma=sigma_obs, observed=nile.height)

    llevel_trace = pm.sample(target_accept=0.95, init='jitter+adapt_diag_grad')

I tested it and it doesn’t work very well. I’d just stick to exactly what you did, adding a non-zero mean to the GRW to induce a drift.

1 Like

Thanks Jesse! This is super helpful and worth giving it a try. Just adding a non-zero mean to the GRW to induce a drift seems to be working, but I’m curious to try the approach you outlined. Thanks again.

1 Like