Forecasting with Time Varying Linear Models

I have several questions related to Rolling Regression — PyMC example gallery that are different enough from my previous thread related to this model Partial Observable Latent Variables that I thought it was best to post separately.

My questions relate to both extending the model to a forecasting context and also general model evaluation.

  1. If trying to extend a rolling regression model to a forecast context, what are approaches to project forward the regression coefficients alpha and beta? (In the exact model linked we do not have covariates in forward space, but in the example I have in mind there are forward forecasts for the relevant covariate). Naively I can think to just fill these forward but this approach seems like it would underestimate the uncertainty coming from parameter drift.
  2. Are there approaches to evaluating x-step ahead predictive performance? The simplest way of doing this, retraining the model on a rolling basis with a hold out test set, suffers from performance limitations since the model takes roughly 600 seconds to fit.
  3. Looking at the rolling coefficient estimates (plotted below) I see persistent differences depending on what time period data sample I use for fitting. I’m wondering how to reason about using priors for controlling local influence of data on rolling parameter estimates (potentially this is a use case for a Gaussian Process?)
  4. Looking at the model in the linked notebook, I see fairly strong and time varying correlation between the alpha and beta parameters and am curious how to interpret this fact of the model.

Here is a snippet for fitting and generating the relevant plots, which can be run in addition to the linked notebook for loading the data

def create_rw_model(price_data):
    price_data_zscored = (price_data - price_data.mean()) / price_data.std()
    
    with pm.Model(coords={"time": price_data.index.values}) as model_randomwalk:
        # std of random walk
        sigma_alpha = pm.Exponential("sigma_alpha", 50.0)
        sigma_beta = pm.Exponential("sigma_beta", 50.0)

        alpha = pm.GaussianRandomWalk(
            "alpha", sigma=sigma_alpha, init_dist=pm.Normal.dist(0, 10), dims="time"
        )
        beta = pm.GaussianRandomWalk(
            "beta", sigma=sigma_beta, init_dist=pm.Normal.dist(0, 10), dims="time"
        )    
        # Define regression
        regression = alpha + beta * price_data_zscored.GFI.values

        # Assume prices are Normally distributed, the mean comes from the regression.
        sd = pm.HalfNormal("sd", sigma=0.1)
        likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=price_data_zscored.GLD.to_numpy())
    return model_randomwalk

with create_rw_model(prices):
    trace_rw_fs = pm.sample_prior_predictive()
    trace_rw_fs.extend(pm.sample(tune=2000, target_accept=0.9))

first_half_sample = prices.index[:500]
second_half_sample = prices.index[500:]

with create_rw_model(prices.loc[first_half_sample]):
    trace_rw_first = pm.sample_prior_predictive()
    trace_rw_first.extend(pm.sample(tune=2000, target_accept=0.9))

with create_rw_model(prices.loc[second_half_sample]):
    trace_rw_second = pm.sample_prior_predictive()
    trace_rw_second.extend(pm.sample(tune=2000, target_accept=0.9))

fig = plt.figure(figsize=(8, 6), constrained_layout=False)
ax = plt.subplot(111, xlabel="time", ylabel="alpha", title="Change of alpha over time.")
d_ = az.extract(trace_rw_fs.posterior, var_names="alpha")
ax.plot(d_.coords["time"].values, d_.mean(dim="sample"), "r", alpha=1, label="Full Sample")
ax.plot(d_.coords["time"].values, d_, "r", alpha=0.03)
d_ = az.extract(trace_rw_first.posterior, var_names="alpha")
ax.plot(d_.coords["time"], d_.mean(dim="sample"), "b", alpha=1, label="First Half Sample")
ax.plot(d_.coords["time"].values, d_, "b", alpha=0.03)
d_ = az.extract(trace_rw_second.posterior, var_names="alpha")
ax.plot(d_.coords["time"].values, d_.mean(dim="sample"), "g", alpha=1, label="Second Half Sample")
ax.plot(d_.coords["time"].values, d_, "g", alpha=0.03)
ax.legend();

fig = plt.figure(figsize=(8, 6), constrained_layout=False)
ax = plt.subplot(111, xlabel="time", ylabel="beta", title="Change of beta over time.")
d_ = az.extract(trace_rw_fs.posterior, var_names="beta")
ax.plot(d_.coords["time"].values, d_.mean(dim="sample"), "r", alpha=1, label="Full Sample")
ax.plot(d_.coords["time"].values, d_, "r", alpha=0.03)
d_ = az.extract(trace_rw_first.posterior, var_names="beta")
ax.plot(d_.coords["time"], d_.mean(dim="sample"), "b", alpha=1, label="First Half Sample")
ax.plot(d_.coords["time"].values, d_, "b", alpha=0.03)
d_ = az.extract(trace_rw_second.posterior, var_names="beta")
ax.plot(d_.coords["time"].values, d_.mean(dim="sample"), "g", alpha=1, label="Second Half Sample")
ax.plot(d_.coords["time"].values, d_, "g", alpha=0.03)
ax.legend();

image

def sample_corr(x1, x2):
    index = x1.index
    x1_vals = x1.values
    x2_vals = x2.values
    corrs = []
    for i in range(x1_vals.shape[0]):
        corrs.append(np.corrcoef(x1_vals[i, :], x2_vals[i, :])[0, 1])
    return pd.Series(corrs, index=index)

alpha = az.extract(trace_rw_fs, var_names="alpha").to_series().unstack(level=["chain", "draw"])
beta = az.extract(trace_rw_fs, var_names="beta").to_series().unstack(level=["chain", "draw"])

corrs = sample_corr(alpha, beta)
corrs.plot(title="Sample Posterior Correlation of alpha and beta")

plt.scatter(alpha.loc["2011-01-03"], beta.loc["2011-01-03"], label="2011-01-03")
plt.scatter(alpha.loc["2014-01-03"], beta.loc["2014-01-03"], label="2014-01-03")
plt.legend()
plt.xlabel("alpha")
plt.ylabel("beta")
plt.title("alpha vs beta")


Some (partial) answers:

  1. You can use pm.sample_posterior_predictive in a new model block, but re-use your idata. This lets you run the GRW forward. There are examples here.

  2. The simple way you suggest is basically the best way. PSIS-LOO is still somewhat valid for time-series, so you can look at that. There is also “leave future out” PSIS, but it’s not really implemented anywhere for PyMC. Re: model speed, try running it with nutpie or JAX to get that 600 seconds down, then doing repeated refits won’t be so painful.

  3. The inference is done jointly over the entire time series, so evidently you have a kind of regime switch in the 2nd half of your data that changes how the model views the first half of your data. It means your results are invalid for backtesting, but it’s fine for forecasting. IDK what your use-case is. My intuition is that a GP would also have the same problem (the future would shift around the parameters used to compute the past), although it has so much flexibility that maybe I’m wrong.

  4. Hard to say anything just from these plots. If you’re worried about the correlations being important, you could explicitly model them with a multivariate GRW?

2 Likes

Thanks @jessegrabowski! Testing with both jax and nutpie I do see some impressive speedups and the posteriors look identical.

On my 8 core laptop with the following cpu 11th Gen Intel(R) Core™ i7-1185G7 @ 3.00GHz I see the following times

Time with default sampler: 413s
Time with nutpie: 120s
Time with blackjax: 118s

I also tried ADVI which showed some speed improvements but the posteriors look quite different. I’m not familiar with this technique but my high level understanding is that this tries to approximate the posterior using a multivariate Gaussian distribution, so my take away here is that the approximation is fairly poor for this model.

2 Likes

You can also try the numpyro sampler, I’ve had better luck with it than blackjax.

Just a small point: the pymc default, numpyro, blackjax, and nutpie are all HMC samplers (maybe you know this already, but I wanted to clarify because of how the legend of your graph is written).

You are right that ADVI is an approximation of the posterior, and that’s it’s not great that the ADVI results disagree so much with NUTS. There is another ADVI algorithm called pathfinder you could try too. It’s in the pymc_experimental package. It reportedly gives better approximations than vanilla ADVI, and it’s written in JAX so it’s very fast.

Yup understood on the HMC samplers point but agreed the legend implies some confusion, thanks for clarifying.

Will take a look at the numpyro sampler later to see what kind of speedups I get there and also play around with pathfinder, thanks!

@jessegrabowski looking a bit more into pathfinder, it seems like this is mostly done as a way to choose better initial conditions for HMC sampling? Wondering if you have any examples of doing this in pymc?

In addition to Jesse’s suggestions - I’ve used AWS Batch to solve this issue in the past. It costs money obviously, but allows you to backtest in parallel, cutting the time down to the time it takes to perform inference once.

2 Likes