We should first distinguish between recursive and nonrecursive timeseries models. @kforeman hinted at this distinction, which I just want to explore formally. Before we start, I want to address the numbered questions in the OP:

In the nonrecursive case, use
MutableData
. In the recursive case, it might be necessary to do something more complex. I will present some solutions that work for me, but @ricardoV94 is working on some more sophisticated/generalization approaches with model cloning and reconstruction here. I won’t address any of these, but suffice to say that forcasting should be better more convenient in the future. But that’s just my forecast 
Since the length of variables is going to change, you will need to either omit coords or reset them after estimation. Personally I omit them at sampling time, because it’s a hassle.

The
predictions=True
flag has nothing at all to do with forecasting. AFAIK, it just makes a new group in youridata
object. This allows you to, in the same object, hold posterior predictive samples for both the training and test data. If you do not set new data withpm.set_data
and runpm.sample_posterior_predictive(predictions=True)
, you will just end up with the training distribution, stored in a different place.
On to some specifics:
NonRecursive Models
These are models in which time simply enters as a covariate. The model the OP posted sort of falls into this class. But actually the model he posted is more like a “crosssection of times”, since there is no time variable. I am splitting hairs here because in order to forecast, as opposed to predict, we should be able to arbitrarily extend the data out into the future. In the posted case, we need the covariate observations from the future to make forecasts, which is of course impossibe.
What I have in mind is instead more like a deterministic trend model: y_t = \alpha + \gamma t + \epsilon_t. Here, we have a time variable t, which we can arbitrarily extend out into the future because it’s just a counter. Here’s a code example:
Example 1: Deterministic Trend
Data Generation
true_alpha = 1
true_gamma = 0.2
true_sigma = 1.5
T = 100
noise = rng.normal(scale=true_sigma, size=T)
t = np.arange(T)
data = true_alpha + true_gamma * t + noise
train_data = data[:90]
test_data = data[10:]
plt.plot(train_data, color='tab:blue')
plt.plot(np.arange(90, 100), test_data, ls='', color='tab:red')
Sampling
with pm.Model() as det_trend:
t_pt = pm.MutableData('t', t[:90])
alpha = pm.Normal('alpha')
gamma = pm.Normal('gamma')
mu = pm.Deterministic('mu', alpha + gamma * t_pt)
sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=train_data, shape=t_pt.shape)
idata = pm.sample()
Prediction and Forecasting
with det_trend:
#insample
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
#outofsample
pm.set_data({'t':t[10:]})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)
In this case I generated the test data so I have it to plot and examine, but you can see that in principle I can extend these forecasts out forever by just setting the t
variable to whatever I want, even if we don’t have any new data yet. Obviously in this example it doesn’t generate very exciting forecasts, but this methodology is actually very powerful because nothing prevents you from using nonlinear functions of t
. In this next example I make forecasts on real data using a prophetlike model, which uses a piecewise linear trend and Fourier seasonality to create rich time dynamics that can be arbitrarily extended into the future:
Example 2: Prophet
Data are from Forecasting: Principles and Practice. I centered and normalized the values so I could autopilot though the prior definitions.
Model
First I make some functions that take in t
and return some nonlinear transformations:
def create_piecewise_trend(t, t_max, n_changepoints):
s = pt.linspace(0, t_max, n_changepoints+2)[1:1]
A = (t[:, None] > s)*1
return A, s
def create_fourier_features(t, n, p=365.25):
x = 2 * np.pi * (pt.arange(n)+1) * t[:, None] / p
return pt.concatenate((pt.cos(x), pt.sin(x)), axis = 1)
def generate_features(t, t_max, n_changepoints=10, n_fourier=6, p=365.25):
A, s = create_piecewise_trend(t, t_max, n_changepoints)
X = create_fourier_features(t, n_fourier, p)
return A, s, X
Then use these in my PyMC model:
t = np.arange(df.shape[0])
# This value isn't on the computation graph (it's not computed dynamically from `t_pt`)
t_max = max(t)
with pm.Model() as prophet_model:
t_pt = pm.MutableData('t', t)
# We have monthly data, so p=12 leads to annual seasonality
A, s, X = generate_features(t_pt, t_max, n_changepoints=10, n_fourier=6, p=12)
initial_slope = pm.Normal('initial_slope')
initial_intercept = pm.Normal('initial_intercept')
# n_changepoint offsets terms to build the peicewise trend
deltas = pm.Normal('offset_delta', shape=(10,))
intercept = initial_intercept + ((s * A) * deltas).sum(axis=1)
slope = initial_slope + (A * deltas).sum(axis=1)
# n_fourier * 2 seasonal coefficients
beta = pm.Normal('beta', size=12)
mu = pm.Deterministic('mu', intercept + slope * t_pt+ X @ beta)
sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=mu, sigma=sigma, observed=df.values.ravel(), shape=t_pt.shape[0])
idata = pm.sample()
Forcasting
with prophet_model:
#insample
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
#outofsample, predict 3 years of home sales
last_t = t[1]
forcast_t = np.arange(last_t, last_t + 36)
pm.set_data({'t':forcast_t})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)
Now I’m working with real data, and I didn’t set aside any test set on purpose. I want to demonstrate that you can just plug in any value for t
that you want and just go hog wild predicting the future. What could possibly go wrong?
Recursive Models
Recursive models are slightly more complex, because we are learning some policy function y_{t+1} = g(y_t). Making forecasts, then, isn’t as simple as just doing pm.set_data
, but it also isn’t much more complex. Basically, we’re going to:
 Have an unknown distribution
x0
that descirbes the initial state of the system before we see data  From this initial
x0
, sample the posterior \theta s that parameterize the g function (includingx0
)  Replace
x0
with aDiracDelta(y[1])
, a spike distribution over the last observed data point  Simulate forward using the g function
Here’s a simple example using a Guassian random walk. In this case, y_{t+1} = g(y_t) = y_t + \epsilon_t, and we just have to learn one parameter, \sigma. It’s easy to implement this model in PyMC using .cumsum()
:
Generate Data
true_sigma = 0.1
T = 100
innovations = rng.normal(scale=true_sigma, size=T)
data = innovations.cumsum()
plt.plot(data)
PyMC Model
with pm.Model() as grw:
sigma_innov = pm.Exponential('sigma_innov', 1)
innovs = pm.Normal('innovations', sigma=sigma_innov, size=100)
sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=data)
idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)
Prediction and Forecasting
# First the easy part, insample prediction
with grw:
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
# Next simulate the recursive model
H = 25
simulations = np.empty((n_chains, n_draws, H))
x0 = data[1]
simulations[:, :, 0] = x0
# simulate the forward model
for t in range(1, H):
simulations[:, :, t] = simulations[:, :, t1] + rng.normal(scale=idata.posterior.sigma_innov) + rng.normal(scale=idata.posterior.sigma)
# Add the simulations to the idata in a "predictions" group
# Congrats, you just did a posterior predictive sampling by hand
import xarray as xr
idata.add_groups({'predictions':xr.Dataset(data_vars={'y_hat':(['chain', 'draw', 'y_hat_dim_2'], simulations)},
coords={'chain':np.arange(n_chains),
'draw':np.arange(n_draws),
'y_hat_dim_2':np.arange(T, T+H)})})
Again, we have a nice funnel of increasing uncertainty, so we can be somewhat confident that nothing went wrong. We also know that the g function is basically the identity, so we should also be happy with the fact that our forecasts are a straight line out from the last data point.
What about more complex models?
Obviously, the GRW is very simple, so doing the forward simulations by hand is really convenient. If we have a more complex model, it might be quite onerous to do this. Many time series models can be cast in linear state space form, though. For example, SARIMA, VARMAX, and exponential smoothing all admit a statespace representation. If you can write down a statespace system, it’s quite easy to do the forward simulations. You would also use this approach to run impulse response functions.
Other models, though, such has HMM or GARCH, don’t admit such a form, and it quickly becomes too much to ask to do the simulation. In that case, I suggest a hack: include missing values after your data to PyMC automatically make predictions for you. Your prediction horizon will be fixed in this case, but at least you won’t have to do anything special to get the predictions. Here’s the GRW model again using this method. The data is unchanged from the first example:
H = 25
extended_data = np.r_[data, np.full(H, np.nan)]
with pm.Model() as grw:
sigma_innov = pm.Exponential('sigma_innov', 1)
innovs = pm.Normal('innovations', sigma=sigma_innov, size=100 + H)
sigma = pm.Exponential('sigma', 1)
y_hat = pm.Normal('y_hat', mu=innovs.cumsum(), sigma=sigma, observed=extended_data)
idata = pm.sample(init='jitter+adapt_diag_grad', target_accept=0.99)
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
As you can see, we get (almost) the same output with less hassle. The only difference is that the forecasts begin from the last prediction, rather than the last data point (the orange line continues the blue line, rather than the red one).
Conclusion
So there you go. Some pointers on forecasting. All the code used is in this notebook, in case you’d like to look at it all gathered in one place. As I mentioned, there is new functionality that will make foreacsting with recursive models easier, it’s just that I haven’t played around with it much yet so I can’t really say anything about it.