I found some models in academic literature, as well as a Stan implementation here: 2.5 Stochastic Volatility Models | Stan User’s Guide. The Stan model does not actually implement the volatility jumps but I think it can be extended to that, provided it can be ported to pymc.
The issue is that the model uses different SDEs for the first log-volatility observation, h(t=1)
, and the subsequent observations h(t>1)
, as described in the above link and the original publication by Kim et al (1998).
I took a first stab at pymc implementation, subclassing EulerMaruyama
and overwriting the logp
method to handle the first and subsequent observations differently. Unfortunately, I think I did something wrong, as it is now very sensitive to the log-vol mu
prior, and if mu
is significantly negative (e.g., -1) it fails with a SamplingError: Initial evaluation of model at starting point failed!
. That is a problem because I think the true log-vol mean is somewhere in the -1 to -5 range, and the model won’t converge otherwise.
I would appreciate any feedback, ideas on what I might have done wrong, or alternative ways to implement. Here’s the code below. Thank you, and Happy New Year!
import theano
import theano.tensor as at
import pymc3 as pm
from pymc3.distributions.timeseries import EulerMaruyama
class EulerMaruyama_SVJ(EulerMaruyama):
def logp(self, x):
# at t1
f, g = self.h1_sde(*self.sde_pars)
mu1 = (x[0] + self.dt * f).reshape(-1, 1)
sd1 = (at.sqrt(self.dt) * g).reshape(-1, 1)
# at t > 1
x_gte1 = x[1:-1]
theano.config.compute_test_value = 'ignore'
f, g = self.h_gt1_sde(x_gte1, *self.sde_pars)
mu_gt1 = (x_gte1 + self.dt * f).reshape(-1, 1)
sd_gt1 = (at.sqrt(self.dt) * g).reshape(-1, 1)
# concatenate and call logp
mu = at.concatenate([mu1, mu_gt1], axis=0)
sd = at.concatenate([mu1, mu_gt1], axis=0)
return at.sum(pm.Normal.dist(mu=mu, sd=sd).logp(x[1:]))
def h1_sde(self, mu, phi, sigma):
return (mu, sigma / at.sqrt(1 - phi * phi))
def h_gt1_sde(self, x, mu, phi, sigma):
return (mu + phi * (x - mu), sigma)
def make_SVJ_model(data):
"""
See:
https://mc-stan.org/docs/2_21/stan-users-guide/stochastic-volatility-models.html
"""
with pm.Model(coords={'time': data.index.values}) as model:
# SDE params
mu = pm.Cachy('mu', 0, 10) # mean log-volatility
phi = pm.Uniform('phi', -1, 1) # volatility persistence
sigma = pm.Normal('sigma', 0, 5)
h = EulerMaruyama_SVJ('h',
dt=1.0,
sde_fn=None, # SDEs are defined inside the class
sde_pars=(mu, phi, sigma),
dims='time',
testval=np.ones_like(data),
)
# likelihood
y = pm.Normal('obs', 0, pm.math.exp(h/2), observed=data, dims='time')
return model
So when I call:
import numpy as np
import yfinance as yf
data = yf.download('SPY')
returns = np.log(data['Adj Close']).diff().dropna()
m = make_SVJ_model(returns.iloc[-500:])
with m:
trace = pm.sample(4000, tune=1000, return_inferencedata=True)
It fails with a:
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'mu': array(-1.), 'phi': array(0.95), 'sigma': array(0.25), 'h': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
[...]
Initial evaluation results:
mu 3.69
phi 3.69
sigma 3.69
h -inf
obs -709.49
Name: Log-probability of test_point, dtype: float64