Multivariate Stochastic Volatility: Implementing a Time-Dependent Covariance?

Hi, I’m relatively new to defining custom distributions so any basic pointers are welcome. For context, I’m working on building a multivariate stochastic volatility model (an extension to the example in the PyMC3 docs). Let’s say that I have n dimensions; I am trying to implement the model:

\log(r_i) \sim t(\nu, \mu, \Sigma_i)
s_i \sim MvNormal(s_{i-1}, \Sigma_{\sigma})
v_i = exp(-2s_i)
\Sigma_i = diag(v_i) \, \cdot \Pi \, \cdot diag(v_i)

where \nu is the degree of freedom, \mu is some vector of constant returns, and \Pi is some constant correlation matrix. I’m essentially defining a multivariate (correlated) random walk for the latent volatility process s_i, and then constructing the covariance matrix \Sigma_i for the observed returns at each time t_i by scaling some constant correlation matrix \Pi by a positive function of the volatility process v_i (representing the volatility in each dimension).

My trouble is this: from what I can tell, the standard MvStudentT RV in PyMC3 (docs, source) cannot handle the case in which each of my observations has a difference covariance. So to complete my model, I gather that I need to define a custom distribution to handle this case by either

  1. Defining the logp function and use pm.DensityDist (described here), or
  2. Define the logp function and use pm.Potential.

I have tried the second approach, but ran into some trouble. My current code is below:


def batch_mult_t_logp(mu, nu, vols, corr, nobs, obs):
    '''
    Defines log probability for a batch of multivariate Student T's with different covariances.
    '''
    logp = 0.0
    for j in range(nobs):
        vol_array = tt.nlinalg.diag(vols[j,:])
        cov_array = tt.nlinalg.matrix_dot(vol_array, corr, vol_array)
        logp += pm.MvStudentT.dist(mu=mu, nu=nu, cov=cov_array).logp(obs[j,:])

    return logp 

with pm.Model() as model:
    
    # Latent volatility.
    vol_chol, vol_corr, vol_stds = pm.LKJCholeskyCov(
        "vol_chol", n=ndim, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True
    )
    vol_cov = pm.Deterministic("vol_cov", vol_chol.dot(vol_chol.T))
    vols = pm.MvGaussianRandomWalk("alpha", mu=0, shape=(nobs, ndim), chol=vol_chol)
    vol_process = pm.Deterministic('vol_process', tt.exp(-2*vols))
    
    # Prior over correlation matrices for observed returns
    corr_nu = pm.Uniform('corr_nu', 0, 5)
    C_triu = pm.LKJCorr('C_triu', corr_nu, ndim)
    C = pm.Deterministic('C', tt.fill_diagonal(C_triu[np.zeros((ndim, ndim), dtype=np.int64)], 1.))
    
    # Prior for mean of observed returns
    mu = pm.Normal("mu", 0.0, 1.0, shape=ndim)
    
    # Tail parameter
    nu1 = pm.HalfNormal("nu_minus_2", sigma=1)
    nu = pm.Deterministic("nu", 2.0+nu1)
    
    # Observed RV
    batch_mult_t = pm.Potential("likelihood", batch_mult_t_logp(mu, nu, vol_process, C, nobs, data))

I know that I should probably be using Theano’s scan() instead of a for loop. Any pointers/examples would be much appreciated!

I managed to code a working version of my model (sampling successful, at least) with theano.scan(), but unfortunately it isn’t practical due to the extremely long time it takes to draw sample. It’s currently sampling at a rate of about 0.07 samples per second – and this is just with ndim = 2 and nobs = 100. For my application I intend to have ndim ~ 10 and nobs ~ 1000.

A brief explanation of the code: in order to evaluate the MvStudentT logp over a batch of observations each with a different covariance matrix, I used pm.math.batched_diag() to form a tensor with the stdevs in each dimension, and then pm.batched_dot to compute a 3-dim tensor (with shape (nobs, ndim, ndim)) from the constant correlation matrix to compute the covariance matrix associated to each data observation. The bottleneck I think comes from the last step, in which I used theano.scan() to iterate over the first dim of my tensors so that I could conveniently call MvStudentT.dist(…).logp() on the slices.

My follow-up question in this thread is: can this be sped up at all? Or, is this model as I’ve specified it “doomed” to always sample incredibly slowly?

I have a hopeful suspicion that there is a better way to “vectorize” MvStudentT.dist(…).logp() to observations with different covariance matrices, but ¯\(ツ)

My latest code:

nobs, ndim = data.shape

with pm.Model() as model:

    data_ = pm.Data("data", data)
    
    # Model for volatility.
    vol_chol, vol_corr, vol_stds = pm.LKJCholeskyCov(
        "vol_chol", n=ndim, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True
    )
    vol_cov = pm.Deterministic("vol_cov", vol_chol.dot(vol_chol.T))
    vols = pm.MvGaussianRandomWalk("alpha", mu=0, shape=(nobs, ndim), chol=vol_chol)
    vol_process = pm.Deterministic('vol_process', tt.exp(-2*vols))
    
    # Prior over correlation matrices
    corr_nu = pm.Uniform('corr_nu', 0, 5)
    C_triu = pm.LKJCorr('C_triu', corr_nu, ndim)
    
    # Define correlation matrix for observed returns
    C = pm.Deterministic('C', tt.fill_diagonal(C_triu[np.zeros((ndim, ndim), dtype=np.int64)], 1.))
    
    # Prior for mean of observed returns
    mu = pm.Normal("mu", 0.0, 1.0, shape=ndim)
    
    # Tail parameter
    nu1 = pm.HalfNormal("nu_minus_2", sigma=1)
    nu = pm.Deterministic("nu", 2.0+nu1)
    
    # Now the hard theano op
    mult_corr = tt.repeat(C[np.newaxis, :, :], nobs, axis=0)
    batch_diags = pm.math.batched_diag(vol_process)
    
    # Multiply from the right
    r_mult = tt.batched_dot(mult_corr, batch_diags)

    # Multiply from the left
    cov_matrices = pm.Deterministic("cov_mats", tt.batched_dot(batch_diags, r_mult))
    
    result, updates = theano.scan(fn=lambda covs, obs, means, tails: pm.MvStudentT.dist(mu=means, nu=tails, cov=covs).logp(obs) ,
                                      outputs_info=None,
                                      sequences=[cov_matrices, data_],
                                      non_sequences=[mu, nu]
                                 )
    
    like = pm.Potential("like", result.sum())
1 Like

I didn’t look much at the code but I noticed you have a prior over the LKJ correl. That might be contributing to the slowness.

I suggest you to set a fixed number for eta to see if that helps. eta=1 implies uniform over (-1, 1), higher eta means more mass around zero.

1 Like

Thanks! I’m not sure where the 5 crept in to my thinking, but I switched to eta = 1 and now my sampling rate is about 0.17 samples per second. Slightly faster, but still not practical.

1 Like