Best Practices for Time Series Forecasting

mgilbert · May 30, 2023, 8:52pm

I’m wondering if there are some examples or docs on the canonical way to forecast time series. In particular, I’m a bit unclear on how to use coords and Data when updating covariates which are used for both inference and out of sample forecasting. Concretely, I have a simple model as follows (sudo code, full code below)

coords = {"date": train_dates}
with pm.Model() as ts_model:

    for k in coords.keys():
        ts_model.add_coord(k, coords[k], mutable=True)

    x = pm.MutableData(
        name="x",
        value=x_train,
        dims=["date"],
    )

    beta = pm.Normal("beta", mu=0, sigma=1)

    mu = pm.Deterministic(
        "mu",
        var=beta * x,
        dims="date",
    )
    sigma = pm.HalfNormal("sigma", sigma=1)

    y = pm.Normal(
        "y",
        mu=mu,
        sigma=sigma,
        observed=y_train,
        dims="date",
    )

Should covariate x that is used for inference and as inputs for forecasting be MutableData or should these be two different random variables in the model DAG, i.e. x_train and x_test?
How should coords be used when they need to change for in sample inference and out of sample forecasting? My impression is this is only supported outside of the context manager.
What are the advantages / use cases of using of forecasting using pymc.sample_posterior_predictive(predictions=True) vs. updating the observed data via model.set_data and then using pymc.sample_posterior_predictive(predictions=False)

I’ve come across several overviews of timeseries and model predictions (linked below) but am struggling to distill out the current best practices (for reference I am using pymc v4.2.1).

This example takes the approach of using separate coords for forecasting and fitting and also uses pymc.sample_posterior_predictive(predictions=True) Forecasting with Structural AR Timeseries — PyMC example gallery

This recommends using MutableData for the observed data and then resetting the observed data for the forecast period Predict with new coords leads to conflicting sizes - #2 by lucianopaz

Here is a full example of a simple model Y ~ Normal(beta*X, sigma) where both Y and X are time series. I have set my model up to train on a training set and then forecast on an out of sample test set. The model code I have is set up as follows

import pymc as pm
import arviz as az
import numpy as np
import pandas as pd
from scipy.signal import medfilt

np.random.seed(1337)

train_dates = pd.date_range("2020-01-01", "2020-12-31")
test_dates = pd.date_range("2021-01-01", "2021-01-31")

x_train = pd.Series(np.random.randn(len(train_dates)), index=train_dates) * 3
x_test = pd.Series(np.random.randn(len(test_dates)), index=test_dates) * 3


real_beta = 0.8

y_train = real_beta * x_train + pd.Series(
    np.random.randn(len(train_dates)), index=train_dates
)
y_test = real_beta * x_test + pd.Series(
    np.random.randn(len(test_dates)), index=test_dates
)


y_test_dummy = pd.Series(0, index=test_dates)


coords = {"date": train_dates}

with pm.Model() as ts_model:

    for k in coords.keys():
        ts_model.add_coord(k, coords[k], mutable=True)

    x = pm.MutableData(
        name="x",
        value=x_train,
        dims=["date"],
    )

    beta = pm.Normal("beta", mu=0, sigma=1)

    mu = pm.Deterministic(
        "mu",
        var=beta * x,
        dims="date",
    )
    sigma = pm.HalfNormal("sigma", sigma=1)

    y = pm.Normal(
        "y",
        mu=mu,
        sigma=sigma,
        observed=y_train,
        dims="date",
    )

    # Fit to the data
    model_trace = pm.sample(
        chains=4,
        idata_kwargs={"log_likelihood": True},
    )
    model_posterior_predictive = pm.sample_posterior_predictive(trace=model_trace)


# Predict
test_coords = {"date": test_dates}

with ts_model:
    ts_model.set_data("x", x_test, coords=test_coords)
    pred_posterior_predictive = pm.sample_posterior_predictive(model_trace.posterior, predictions=True)

This gives the following outputs

az.extract_dataset(model_posterior_predictive.posterior_predictive).y.mean("sample").to_series().plot(title="Train Data")
model_posterior_predictive.observed_data.y.to_series().plot(color="k", marker="x", linestyle="None")


plt.figure()
az.extract_dataset(pred_posterior_predictive.predictions).y.mean("sample").to_series().plot(title="Test Data")
y_test.plot(color="k", marker="x", linestyle="None")

kforeman · June 7, 2023, 5:51pm

Thanks for starting this conversation, I’m very interested in it as well. I have tried using MutableData to make forecasts, and it works for simple linear models like in your example but if I include e.g. AR terms then the estimates between in-sample and out-of-sample don’t align. I’m guessing in that case it’s necessary to not just mutate the data, but also create a separate RV that forces the alignment like the Forecasting with Structural AR Timeseries example does using DiracDelta.

cluhmann · June 7, 2023, 8:06pm

@jessegrabowski might be equipped to make suggestions.

jessegrabowski · June 8, 2023, 5:53am

We should first distinguish between recursive and non-recursive time-series models. @kforeman hinted at this distinction, which I just want to explore formally. Before we start, I want to address the numbered questions in the OP:

In the non-recursive case, use MutableData. In the recursive case, it might be necessary to do something more complex. I will present some solutions that work for me, but @ricardoV94 is working on some more sophisticated/generalization approaches with model cloning and reconstruction here. I won’t address any of these, but suffice to say that forcasting should be better more convenient in the future. But that’s just my forecast
Since the length of variables is going to change, you will need to either omit coords or re-set them after estimation. Personally I omit them at sampling time, because it’s a hassle.
The predictions=True flag has nothing at all to do with forecasting. AFAIK, it just makes a new group in your idata object. This allows you to, in the same object, hold posterior predictive samples for both the training and test data. If you do not set new data with pm.set_data and run pm.sample_posterior_predictive(predictions=True), you will just end up with the training distribution, stored in a different place.

On to some specifics:

Non-Recursive Models

These are models in which time simply enters as a covariate. The model the OP posted sort of falls into this class. But actually the model he posted is more like a “cross-section of times”, since there is no time variable. I am splitting hairs here because in order to forecast, as opposed to predict, we should be able to arbitrarily extend the data out into the future. In the posted case, we need the covariate observations from the future to make forecasts, which is of course impossibe.

What I have in mind is instead more like a deterministic trend model: y_t = \alpha + \gamma t + \epsilon_t. Here, we have a time variable t, which we can arbitrarily extend out into the future because it’s just a counter. Here’s a code example:

Example 1: Deterministic Trend

Data Generation

true_alpha = 1
true_gamma = 0.2
true_sigma = 1.5
T = 100

noise = rng.normal(scale=true_sigma, size=T)
t = np.arange(T)
data = true_alpha + true_gamma * t + noise
train_data = data[:90]
test_data = data[-10:]

plt.plot(train_data, color='tab:blue')
plt.plot(np.arange(90, 100), test_data, ls='--', color='tab:red')

Sampling

with pm.Model() as det_trend:
    t_pt = pm.MutableData('t', t[:90])
    alpha = pm.Normal('alpha')
    gamma = pm.Normal('gamma')
    
    mu = pm.Deterministic('mu', alpha + gamma * t_pt)
    sigma = pm.Exponential('sigma', 1)
    
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=train_data, shape=t_pt.shape)
    idata = pm.sample()

Prediction and Forecasting

with det_trend:
    #in-sample 
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
    
    #out-of-sample
    pm.set_data({'t':t[-10:]})
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)

In this case I generated the test data so I have it to plot and examine, but you can see that in principle I can extend these forecasts out forever by just setting the t variable to whatever I want, even if we don’t have any new data yet. Obviously in this example it doesn’t generate very exciting forecasts, but this methodology is actually very powerful because nothing prevents you from using non-linear functions of t. In this next example I make forecasts on real data using a prophet-like model, which uses a piecewise linear trend and Fourier seasonality to create rich time dynamics that can be arbitrarily extended into the future:

Example 2: Prophet

Data are from Forecasting: Principles and Practice. I centered and normalized the values so I could autopilot though the prior definitions.

Model

First I make some functions that take in t and return some non-linear transformations:

def create_piecewise_trend(t, t_max, n_changepoints):    
    s = pt.linspace(0, t_max, n_changepoints+2)[1:-1]
    A = (t[:, None] > s)*1
    
    return A, s

def create_fourier_features(t, n, p=365.25):
    x = 2 * np.pi * (pt.arange(n)+1) * t[:, None] / p
    return pt.concatenate((pt.cos(x), pt.sin(x)), axis = 1)

def generate_features(t, t_max, n_changepoints=10, n_fourier=6, p=365.25):
    A, s = create_piecewise_trend(t, t_max, n_changepoints)
    X = create_fourier_features(t, n_fourier, p)
    
    return A, s, X

Then use these in my PyMC model:

t = np.arange(df.shape[0])

# This value isn't on the computation graph (it's not computed dynamically from `t_pt`)
t_max = max(t)

with pm.Model() as prophet_model:
    t_pt = pm.MutableData('t', t)

    # We have monthly data, so p=12 leads to annual seasonality
    A, s, X = generate_features(t_pt, t_max, n_changepoints=10, n_fourier=6, p=12)
    
    initial_slope = pm.Normal('initial_slope')
    initial_intercept = pm.Normal('initial_intercept')
    
    # n_changepoint offsets terms to build the peicewise trend
    deltas = pm.Normal('offset_delta', shape=(10,))
        
    intercept = initial_intercept + ((-s * A) * deltas).sum(axis=1)
    slope = initial_slope + (A * deltas).sum(axis=1)
    
    # n_fourier * 2 seasonal coefficients
    beta = pm.Normal('beta', size=12)
    
    mu = pm.Deterministic('mu', intercept + slope * t_pt+ X @ beta)
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=df.values.ravel(), shape=t_pt.shape[0])
    
    idata = pm.sample()

Forcasting

with prophet_model:
    #in-sample 
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
    
    #out-of-sample, predict 3 years of home sales
    last_t = t[-1]
    forcast_t = np.arange(last_t, last_t + 36)
    pm.set_data({'t':forcast_t})
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)

Now I’m working with real data, and I didn’t set aside any test set on purpose. I want to demonstrate that you can just plug in any value for t that you want and just go hog wild predicting the future. What could possibly go wrong?

Recursive Models

Recursive models are slightly more complex, because we are learning some policy function y_{t+1} = g(y_t). Making forecasts, then, isn’t as simple as just doing pm.set_data, but it also isn’t much more complex. Basically, we’re going to:

Have an unknown distribution x0 that descirbes the initial state of the system before we see data
From this initial x0, sample the posterior \theta s that parameterize the g function (including x0)
Replace x0 with a DiracDelta(y[-1]), a spike distribution over the last observed data point
Simulate forward using the g function

Here’s a simple example using a Guassian random walk. In this case, y_{t+1} = g(y_t) = y_t + \epsilon_t, and we just have to learn one parameter, \sigma. It’s easy to implement this model in PyMC using .cumsum():

Generate Data

true_sigma = 0.1
T = 100

innovations = rng.normal(scale=true_sigma, size=T)
data = innovations.cumsum()
plt.plot(data)

PyMC Model

with pm.Model() as grw:
    sigma_innov = pm.Exponential('sigma_innov', 1)
    innovs = pm.Normal('innovations', sigma=sigma_innov, size=100)
    
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=data)
    
    idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)

Prediction and Forecasting

# First the easy part, in-sample prediction
with grw:
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

# Next simulate the recursive model
H = 25
simulations = np.empty((n_chains, n_draws, H))
x0 = data[-1]

simulations[:, :, 0] = x0

# simulate the forward model
for t in range(1, H):
    simulations[:, :, t] = simulations[:, :, t-1] + rng.normal(scale=idata.posterior.sigma_innov) + rng.normal(scale=idata.posterior.sigma)

    
# Add the simulations to the idata in a "predictions" group
# Congrats, you just did a posterior predictive sampling by hand
import xarray as xr
idata.add_groups({'predictions':xr.Dataset(data_vars={'y_hat':(['chain', 'draw', 'y_hat_dim_2'], simulations)},
                                           coords={'chain':np.arange(n_chains),
                                                   'draw':np.arange(n_draws), 
                                                   'y_hat_dim_2':np.arange(T, T+H)})})

Again, we have a nice funnel of increasing uncertainty, so we can be somewhat confident that nothing went wrong. We also know that the g function is basically the identity, so we should also be happy with the fact that our forecasts are a straight line out from the last data point.

What about more complex models?

Obviously, the GRW is very simple, so doing the forward simulations by hand is really convenient. If we have a more complex model, it might be quite onerous to do this. Many time series models can be cast in linear state space form, though. For example, SARIMA, VARMAX, and exponential smoothing all admit a state-space representation. If you can write down a state-space system, it’s quite easy to do the forward simulations. You would also use this approach to run impulse response functions.

Other models, though, such has HMM or GARCH, don’t admit such a form, and it quickly becomes too much to ask to do the simulation. In that case, I suggest a hack: include missing values after your data to PyMC automatically make predictions for you. Your prediction horizon will be fixed in this case, but at least you won’t have to do anything special to get the predictions. Here’s the GRW model again using this method. The data is unchanged from the first example:

H = 25
extended_data = np.r_[data, np.full(H, np.nan)]

with pm.Model() as grw:
    sigma_innov = pm.Exponential('sigma_innov', 1)
    innovs = pm.Normal('innovations', sigma=sigma_innov, size=100 + H)
    
    sigma = pm.Exponential('sigma', 1)
    y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=extended_data)
    
    idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

As you can see, we get (almost) the same output with less hassle. The only difference is that the forecasts begin from the last prediction, rather than the last data point (the orange line continues the blue line, rather than the red one).

Conclusion

So there you go. Some pointers on forecasting. All the code used is in this notebook, in case you’d like to look at it all gathered in one place. As I mentioned, there is new functionality that will make foreacsting with recursive models easier, it’s just that I haven’t played around with it much yet so I can’t really say anything about it.

twiecki · June 8, 2023, 10:38am

This should be an example in pymc-examples.

kforeman · June 8, 2023, 8:52pm

Thanks @jessegrabowski, looks great!

I have also settled on just including the forecast period as missing data for now. The main limitation is that it doesn’t (easily) allow for counterfactual scenario forecasts, but otherwise it’s the simplest solution that doesn’t effectively require making every change to the model twice.

jessegrabowski · June 9, 2023, 12:33am

I’m not sure what you mean by counterfactual scenario forecasts – something like difference-in-difference analysis? Or impulse response functions/scenario based simulation (scroll down towards the bottom)? Actually, the series of causal inference notebooks have some nice techniques for handling something very close to forecasting. I haven’t carefully read them, but this is a nice opportunity to do so. There is also the do-operator in pymc-experimental, depending on what you’re doing.

kforeman · June 9, 2023, 2:23am

Yep, basically something like that - generating multiple forecasts with the same model fit but different future “data”, e.g. y ~ a + B*x + GRW where I want to generate forecasts y for different future trajectories of x without refitting the model. The do operator looks incredibly useful, thanks for pointing me towards that!

jessegrabowski · June 9, 2023, 2:25am

Is x_t modeled jointly with y_t, or is it exogenous? If it’s exogenous, you can just use pm.set_data together with the missing value method.

jessegrabowski · June 14, 2023, 4:21am

@ricardoV94 and @tcapretto just dropped a blog post on this subject that goes much, much deeper into how to handle recursive models than what I showed here. Turns out there’s no need to muck around with copying models, do-operators, or anything of the sort: PyMC can handle updates to sections of your model out of the box just fine. Everyone should definitely check it out.

Galen_Seilis · August 14, 2024, 9:02pm

Thanks for these examples, @jessegrabowski .

For the Prophet model, how would you include groupings in the data? I tried setting coords={'group': df.group.unique()} and used dim=group for parameters, but I am getting lots of shape issues.

ValueError: Input dimension mismatch: (input[0].shape[0] = 10, input[1].shape[0] = 560)
Apply node that caused the error: Composite{...}(sigma_log__, Gemv{no_inplace}.0, t, initial_slope, Sum{axis=1}.0, Sum{axis=1}.0, initial_intercept)
Toposort index: 16
Inputs types: [TensorType(float64, shape=(None,)), TensorType(float64, shape=(560,)), TensorType(int32, shape=(None,)), TensorType(float64, shape=(None,)), TensorType(float64, shape=(None,)), TensorType(float64, shape=(None,)), TensorType(float64, shape=(None,))]
Inputs shapes: [(10,), (560,), (560,), (10,), (560,), (560,), (10,)]
Inputs strides: [(8,), (8,), (4,), (8,), (8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[All{axes=None}(Composite{...}.0)], [Sum{axes=None}(sigma_log___logprob)], [Switch(ExpandDims{axis=0}.0, Composite{...}.2, [-inf])]]

group is a column of strings (there are 10 distinct values).

I have to admit that even after trying to read about coords and dims (e.g . Distribution Dimensionality — PyMC 5.16.2 documentation) I still find it hard to understand.

jessegrabowski · August 15, 2024, 7:47am

It looks like you were on the right track, you just have to use fancy indexing to expand the group parameter vectors and make everything match up. (Note: I linked the numpy docs because the same rules apply in pytensor, which is what pymc is using under the hood).

The best way to get the indices you need is by using pd.factorize. This has the advantage of also generating the labels you need for the coords!

For some panel data, I use world bank data on GDP per capita for every country in the world

import pandas as pd
from pandas_datareader.wb import WorldBankReader

# Download per-capita GDP data
data = WorldBankReader(symbols=['NY.GDP.PCAP.CD'],
                       countries='all',
                       start=1900,
                       end=2024).read()

# countries = "all" includes a bunch of aggregated economies, which I filter out
countries = data.reset_index().country.unique()
filter_words = ['Asia', 'Africa', 'America', 'World', 
              'Latin', 'Caribbean', 'Europe', 'Pacific', 
              'dividend', 'area', 'situation', 'countries', 
                'income', 'only', 'total', 'blend', 'classif', 'member', 'other', 'states']
whitelist = ['South Africa', 'American Samoa', 'Central African Republic', 'United States']
drop_list = [x for x in countries if any(w.lower() in x.lower()
                                         for w in filter_words) and x not in whitelist]


# Apply filter and convert year to datetime objects
df = (data
      .reset_index()
      .assign(year=lambda x: pd.to_datetime(x.year))
      .set_index(['country', 'year'])
      .drop(level=0, labels=drop_list)
      .sort_values(by=['country', 'year']))

The data looks like this:

print(df.head().to_string())

                        NY.GDP.PCAP.CD
country     year                      
Afghanistan 1960-01-01             NaN
            1961-01-01             NaN
            1962-01-01             NaN
            1963-01-01             NaN
            1964-01-01             NaN

You might be more used to working with wide data that looks like this:

print(df.unstack('country').head().iloc[:, :5].to_string())

           NY.GDP.PCAP.CD                                           
country       Afghanistan Albania     Algeria American Samoa Andorra
year                                                                
1960-01-01            NaN     NaN  239.033006            NaN     NaN
1961-01-01            NaN     NaN  209.917178            NaN     NaN
1962-01-01            NaN     NaN  169.927013            NaN     NaN
1963-01-01            NaN     NaN  225.823391            NaN     NaN
1964-01-01            NaN     NaN  238.877805            NaN     NaN

At least I was when I come over to PyMC. It turns out to be much less convenient to work with, so if you have data that looks like this, you want to put it into long format.

From here, we can create indexing vectors and labels using pd.factorize on the index. The factorize docs have very nice examples if you’ve never seen this function before.

country_idx, country = pd.factorize(df.index.get_level_values(0))
year_idx, year = pd.factorize(df.index.get_level_values(1))

Now we’re ready to make a PyMC model.

year_data = df.index.get_level_values(1)

# Since we now have two dimensions to juggle (time and country), we have to build t separately.
# We just want one time vector whose length is the number of time **labels** in the dataset.
t = np.arange(len(years))
t_max = max(t)

# Prophet hyperparameters
n_changepoints = 10
p = 12
n_fourier = 6 # Fully saturated, because n = p / 2. Can't go higher than this

coords = {
    'country':countries,
    'year':years,
    'changepoint': np.arange(n_changepoints, dtype='int'),
    'fourier_feature': [f'{f}_{i}' for f in ['cos', 'sin'] for i in range(n_fourier)],
    
    # Dummy index representing the multi-index on the data. We will replace this
    # with a real multi-index after sampling, because we're not allowed to specify
    # a multi-index inside a PyMC model.
    'obs_idx': np.arange(df.shape[0], dtype='int')
}

with pm.Model(coords=coords) as prophet_model:
    t_pt = pm.Data('t_pt', t, dims=['year'])
    
    year_idx_pt = pm.Data('year_idx', year_idx, dims=['obs_idx'])
    country_idx_pt = pm.Data('country_idx', country_idx, dims=['obs_idx'])

    A, s, X = generate_features(t_pt, 
                                t_max, 
                                n_changepoints=n_changepoints, 
                                n_fourier=n_fourier, 
                                p=p)
    
    # This is NOT necessary and does not change the model. It will just waste memory
    # by saving these matrices to the trace for every sample. I am doing it to make the 
    # shapes as obvious as possible
    A = pm.Deterministic('A', A, dims=['year', 'changepoint'])
    s = pm.Deterministic('s', s, dims=['changepoint'])
    X = pm.Deterministic('X', X, dims=['year', 'fourier_feature'])
    
    # Every country gets its own slope and an intercept
    initial_slope = pm.Normal('initial_slope', sigma=100, dims=['country'])
    initial_intercept = pm.Normal('initial_intercept', sigma=100, dims=['country'])
    
    # Every country gets its own changepoints    
    deltas = pm.Normal('offset_delta', sigma=100, dims=['country', 'changepoint'])
    
    # Every country gets its own seasonality coefficents (but they all share the feature matrix!)
    beta = pm.Normal('beta', sigma=100, dims=['country', 'fourier_feature'])
        

    # Now we have to reason about shapes. From here on out, things need to be shape (obs_idx, )
    # so we have to "blow up" the appropriate dimensions using our index data.
    # For example, the intercept formula is: initial_intercept + ((-s * A) * deltas).sum(axis=1)
    # Rewritten using shapes:
    # (country,) + ((changepoint,) * (year, changepoint) * (country, changepoint)).sum(dim="changepoint")
    
    # "country" and "year" each need to be blown up to "obs_idx". Then s needs to be given a dummy index
    # so the term (1, changepoint) * (year, changepoint) correctly broadcasts to (year, changepoint)
    
    # Saving these intermediate computations is not necessary. I'm trying to be as verbose as possible
    long_initial_intercept = initial_intercept[country_idx_pt] # (obs_idx,)
    middle_term = -s[None] * A[year_idx_pt] # (obs_idx, changepoint)
    long_deltas = deltas[country_idx_pt] # (obs_idx,)
    
    intercept = pm.Deterministic('intercept', 
                                 long_initial_intercept + (middle_term * long_deltas).sum(axis=1),
                                 dims = ['obs_idx'])
    
    # Similar logic with slope. Here I skip the intermediate compuations and I don't wrap it in pm.Deterministic
    # to emphasize that neither is necessary
    
    slope = initial_slope[country_idx_pt] + (A[year_idx_pt] * deltas[country_idx_pt]).sum(axis=1)
    
    # One last detail, we can't do X @ beta for the regression, because the shapes don't match --
    # (year, fourier) @ (country, fourier). Instead, we have to blow up each to 
    # (obs_idx, fourier), (obs_idx, fourier) then do the product and sum that define a dot "by hand"
    
    mu = pm.Deterministic('mu', 
                          (
                              intercept
                              + slope * t_pt[year_idx_pt] 
                              + (X[year_idx] * beta[country_idx]).sum(axis=-1)
                          ),
                          dims=['obs_idx'])
                          
    sigma = pm.Exponential('sigma', 1, dims=['country'])
    y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma[country_idx_pt], 
                      observed=df.values.squeeze(), 
                      shape=country_idx_pt.shape[0],
                      dims=['obs_idx'])
    
    idata = pm.sample(nuts_sampler='numpyro', chains=8, draws=500)
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)

This model is pretty chonky, so it takes a bit of time to sample (~1 hour on my macbook). I strongly recommend using a JAX sampler like numpyro.

Once you’re done, a last important step to make working with the resulting idata is to replace the obs_idx dummy index with a multi-index:

import xarray as xr

xr_idx = xr.Coordinates.from_pandas_multiindex(df.index, 'obs_idx')
idata = idata.assign_coords(xr_idx)

Now you can easily look at specific outputs:

import matplotlib.pyplot as plt
import arviz as az
fig, ax = plt.subplots(figsize=(14, 4), dpi=144)
mu = idata.posterior_predictive.y_hat.mean(dim=['chain', 'draw'])
hdi = az.hdi(idata.posterior_predictive.y_hat).y_hat

to_plot = ['Senegal', 'Mongolia', 'Algeria', 'Afghanistan']
for i, plot_country in enumerate(to_plot):
    x_grid = hdi.sel(country=plot_country).coords['year']
    c = plt.color_sequences['Dark2'][i]
    mu.sel(country=plot_country).plot.line(x='year', ax=ax, color=c, label=plot_country)
    ax.fill_between(x_grid,
                    *hdi.sel(country=plot_country).values.T,
                    alpha=0.25,
                    color=c)
    ax.plot(x_grid, df.loc[plot_country].values, lw=2, c=c, ls='--')
ax.set(title = 'GDP per Capita', ylabel='Constant 2015 USD')
ax.legend()
plt.show()

You can see that we got automatic interpolation for the countries with missing values (Afghanistan and Mongolia in the graph). But this model is just 217 models stapled together. That’s because there’s no pooling anywhere. Each country gets all its own parameters, and even it’s own sigma. So that amounts to one model per country (totally unpooled). Adding pooling to the parameters would just mean changing the priors on the initial_slope, initial_intercept, changepoint and beta. There are a lot of resources available if you’re stuck on that point. I recommend the radon case study as the best place to start.

Forcasting will also be the same as above, but you will need to pass new values for year_idx_pt, and country_idx_pt, along with the new timepoints t.

Topic		Replies	Views
How do I make future predictions in a time series model with out of sample data? version agnostic prediction	12	2598	April 28, 2023
Forecasting with pymc3 Questions	1	2568	November 24, 2017
How can one bring in predictive variables to a time series model? v3 modeling	14	1837	August 31, 2022
How can i sample posteriors from training-data, run posterior predictive on testing data (with complex factor graph) v5 modeling	9	685	April 23, 2023
Sample_posterior_predicitve not catching shape of new data v5 prediction	10	1305	August 24, 2022

Best Practices for Time Series Forecasting

Non-Recursive Models

Example 1: Deterministic Trend

Data Generation

Sampling

Prediction and Forecasting

Example 2: Prophet

Model

Forcasting

Recursive Models

Generate Data

PyMC Model

Prediction and Forecasting

What about more complex models?

Conclusion

Related topics