Forecasting out-of-sample discrete_markov_chain.ipynb

Hi everyone!! :+1:
I am trying to do a quick out-of-sample forcasting on the example of the markov switching autoregressive model. I would like to calculate a forecast year from the last historical data.
Can anyone help me/has anyone done anything yet? Thanks in advance!
@jessegrabowski, @ricardoV94 told me that you are the forecast wizard, can you give me a cue? :blush:


1 Like

Hi Nicolo!

The short and unhelpful answer is that you need to use a predictive model. Here’s an article that explains all about predictive models. The tl;dr is that in PyMC, you’re allowed to recycle posteriors inside new models. This is useful because you can tweak parts of your model to get it to do new and interesting things, like recover marginalized variables, do counter-factual analysis, and, importantly, forecast.

When you fit the HMM, you use the scan to do a sequence of one-step ahead forecasts. Looking at the model again:

    y = pm.ConstantData("y", dta_hamilton, dims="dates")
    yobs = pm.ConstantData("yobs", dta_hamilton.iloc[4:], dims=["obs_dates"])

    # ... other stuff

    result, updates = pytensor.scan(
            {"input": hidden_states, "taps": [0, -1, -2, -3, -4]},
            {"input": y, "taps": [-1, -2, -3, -4]},
        non_sequences=[state_mus, ar_coefs],

    sigma = pm.HalfCauchy("sigma", 0.8)
    obs = pm.Normal("y_hat", mu=result, sigma=sigma, observed=yobs, dims=["obs_dates"])

So the data appears twice, once as y, which is the full dataset, and again as yobs, which drops the first 4 observations. In this way, we’re doing one-step ahead forecasts.

This is nice for fitting but useless for forecasting. For forecasting, we want to use the last 4 observations to kick of a recursive cycle of input data → step forward → output data → input data → step forward …

To do this, we need to re-write the model. Here’s how it looks, with some comments that I hope help to explain what is being done and why:

# Number of steps forward to forecast
H = 20

# Generate date coordinates for the forecasts, that will make it easier to plot it together with the observed data.
# pd.date_range is left inclusive, so we want to drop the first date (that's data). 
coords.update({'forecast_dates':pd.date_range(dta_hamilton.index[-1], freq=dta_hamilton.index.freq, periods=H+1)[1:]})

# Notice that we're making a new model! I'm calling this one forecast_model, but it just needs to be new.
with pm.Model(coords=coords) as forecast_model:
    # Grab the last 4 observations as the first inputs to our forecasts
    last_obs = pm.ConstantData("last_obs", dta_hamilton.iloc[-4:])
    # Variables that we want to be sampled from the posterior we obtained during fitting should use pm.Flat distributions.
    # This is not strictly necessary -- pm.sample_posterior_predictive matches on the name of the random variables, and if
    # it finds a match, it will totally ignore the model variable. BUT, pm.Flat will raise an error if it doesn't find a 
    # match, which is desireable in our case.
    P = pm.Flat("P", size=(2,2))
    hidden_states = pm.Flat("hidden_states", dims=["dates"])
    # I found that hidden_states was sampled as float64, which raised an error, so I had to add this intermediate variable.
    hidden_states_int = pm.Deterministic('hidden_states_int', hidden_states.astype('int64'))
    # A DiracDelta is a distribution with only a single value. DiscreteMarkovChain needs an initial distribution, and we will
    # tell it to always begin from the last estimated hidden state.
    initial_hidden = pm.DiracDelta.dist(hidden_states_int[-1], shape=(1,))
    # Sample a new sequence of hidden states going forward from the end of the data. I say steps = H + 2 because that's what
    # ended up working...
    forecast_states = DiscreteMarkovChain('forecast_states', P=P, init_dist=initial_hidden, steps=H + 2)
    # We lose the first 4 hidden states when we scan (because there are 4 lags), so we need to append the initial hidden
    # states to get the ball rolling.
    all_hidden = pm.Deterministic('all_hidden', pt.concatenate([initial_hidden, forecast_states]))
    # More pm.Flat, because we just want to sample these from the posterior
    state_mus = pm.Flat("state_mus", dims=["states"],)
    ar_coefs = pm.Flat("coefs", size=order, dims=["ar_params"])

    # New scan. Notice that now we're using outputs_info instead of sequences -- this is what signals that we're doing 
    # recursive computation. Otherwise things are pretty much the same; we even recycle the AR_step function.
    result, updates = pytensor.scan(
        sequences=[{"input": all_hidden, "taps": [0, -1, -2, -3, -4]}],
        outputs_info=[{'initial':last_obs, 'taps':[-1, -2, -3, -4]}],
        non_sequences=[state_mus, ar_coefs],
    # More pm.Flat!
    sigma = pm.Flat("sigma")
    mu = pm.Deterministic('mu', result, dims=['forecast_dates'])
    # We actually want to sample this variable, so we need to 1) use a real distribution (NOT pm.Flat!), and 2) give it a 
    # unique name (that wansn't in the parent model)
    forecast = pm.Normal("forecast", mu=result, sigma=sigma, dims=["forecast_dates"])
    # Call pm.sample_posterior_predictive, providing the idata that contains our posteriors
    idata_forecast = pm.sample_posterior_predictive(idata, var_names=['forecast'])

Here’s how the forecasts look (HDI shaded from 50% (darkest) to 99% (lightest)):

I might have flubbed the indices, make sure to double-check all that.


Wow!! Thany you very much :smiling_face:
In my opinion it could be added to the example notebook. It could be very useful for some novices like me.
Can you also share me the code for the graph?
In the next few days I would like to implement from your example a VAR model and then maybe add some exogenous variables. Thank you again :heart:


Plotting code:

def shade_background(ppc, ax, var_name, x_grid, palette="cividis", hdi_min=0.5, hdi_max=0.99, color_grid_size=100):
    palette = palette
    cmap = plt.get_cmap(palette)
    hdi_probs = np.linspace(hdi_min, hdi_max, color_grid_size)[::-1]
    probs_scaled = (hdi_probs - hdi_min) / (hdi_max - hdi_min)
    colors = map(cmap, probs_scaled)
    for color_val, hdi_prob in zip(colors, hdi_probs):
        hdi = az.hdi(ppc.posterior_predictive[var_name], hdi_prob=hdi_prob)[var_name]
        ax.fill_between(x_grid, *hdi.values.T, color=color_val, alpha=0.25, 

fig, ax = plt.subplots(figsize=(14, 4), dpi=144)
[spine.set_visible(False) for spine in ax.spines.values()]
ax.grid(ls='--', lw=0.5, zorder=0)

ax.plot(dta_hamilton.index, dta_hamilton.values)
shade_background(idata_forecast, ax, 'forecast', coords['forecast_dates'], palette='Reds_r')
forecast_mu = idata_forecast.posterior_predictive.forecast.mean(dim=['chain', 'draw'])
ax.plot(coords['forecast_dates'], forecast_mu, ls='--', c='k', lw=2)

There was actually an error in the code that made the graph I posted, so here’s a corrected output:

1 Like

For VAR models there is an entire example notebook dedicated to them, so you should have a look at that. You could also give the under-development statespace API a try.

1 Like

Thank you very much, you have been really kind. Thanks for the work you programmers do, the pymc library is something great. :heart::tada:
One last question, are there any examples of ARX/VARX models in pymc by any chance?
Speaking of statespace I am not very clear about the difference with in Markov switch type models. That is, instead of having two states the models with statespace have the parameters moving every t. Have I said something vaguely correct?

I’m working on examples for SARIMAX now, but they’re not done. The way to do it will be analogous to this example of an ARIMA-GARCH model, except instead of an ARIMA model with GARCH distributed errors, you will have a regression model with ARIMA-distributed errors. The model is:

\begin{align}y_t &= X_t \beta + \eta_t \\ \eta_t &= \sum_{i=1}^p \rho_i \eta_{t-i} + \sum_{j=1}^q \theta_j \varepsilon_{t-j} + \varepsilon_t \end{align}

So inside the scan you need to compute eta = y - X @ beta, then model the mean of eta mu_eta = (rhos * eta_lags).sum() + (thetas * epsilon_lags).sum(). Compute the current step epsilon as epsilon_t = eta - mu_eta, and compute your model’s prediction as y ~ N(X @ beta + mu_eta, sigma)

This is all like what is done in the 2nd example in the GARCH notebook, except there the ARIMA dynamics play the role of the regression, and the GARCH dynamics play the role of the ARIMA dynamics.

Re: state-space, it’s just a mathematical framework for organizing time-series models. Many models can be written in the state-space framework, so it’s like a unifying family. Time-varying parameters can also be cast into a state-space (as latent “states” to be estimated, rather than single parameters), but they aren’t the defining feature.

You can think of an HMM as a type of non-linear state-space model, where one of the states (called the “state”, which is confusing) is discrete, while other state (the emission – the actual data we observe) is a continuous function of the discrete state.

1 Like

Very clear, so in fact in statespace you can recreate “almost” all models even the nonlinear ones like full time-varying parameter or with state parameters like Markov switching.
Are there some econometric models that cannot be recreated in statespace?

No, I don’t think so.

But be careful, because the “statespace” packages you usually see (ours, and the one in statsmodels.tsa) is talking about linear gaussian state space models, which a lot of models don’t fall into.

HMMs are not linear, so you can’t represent those in the LGSS framework. Certain Holt-Winter exponential smoothing also can’t be expressed as LGSS. Non-gaussian models also don’t fall under the LGSS umbrella, so GARCH models are also out.

But in general, “state space” is so broad that yeah, pretty much everything falls under it’s umbrella.

1 Like

Thank you very much for you time @jessegrabowski! :blush:
Very helpful and kind.