I want to do a timeseries forecasting procedure using PyMC3. I am open to suggestions, but I have tried modeling the forecast using a gaussianRandomWalk, by adding NaNs to the dataset before sampling. This leads to the sampling and hence prection of the missing datapoints in the model. My question is:
- Is this the best approach for this?
- Can someone help me explain why the forecasted values diverge so rapidly, when the observed data seems more stable? see figure.
The dataset used is from a classic volatility example. Code can be seen below:
import pandas as pd
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
returns = pd.read_csv(pm.get_data('SP500.csv'), parse_dates=True, index_col=0)
nanList = pd.Series([np.nan], index=["change"])
for i in range(500):
returns = returns.append(nanList, ignore_index=True)
print(returns)
with pm.Model() as sp500_model:
nu = pm.Exponential('nu', 1/10., testval=5.)
sigma = pm.Exponential('sigma', 1/0.02, testval=.1)
s = pm.GaussianRandomWalk('s', sigma=sigma, shape=len(returns))
volatility_process = pm.Deterministic('volatility_process', pm.math.exp(-2*s)**0.5)
r = pm.StudentT('r', nu=nu, sigma=volatility_process, observed=returns['change'])
trace = pm.sample(2000)
pm.traceplot(trace, varnames=['nu', 'sigma'])
plt.show()
fig, ax = plt.subplots(figsize=(15, 8))
returns.plot(ax=ax)
ax.plot(returns.index, 1/np.exp(trace['s',::5].T), 'C3', alpha=.03)
ax.set(title='volatility_process', xlabel='time', ylabel='volatility')
ax.legend(['S&P500', 'stochastic volatility process'])
plt.ylim([-0.1, 0.15])
plt.show()
plt.plot(1/np.exp(trace['s',::5].T), 'C3', alpha=0.03)
plt.show()