Getting started with rolling co-variance matrix estimate?

newquant · November 16, 2018, 8:26pm

Hello, I’m new to PYMC3 and PPL’s in general. I have a matrix of stock returns and I’m trying to use PYMC3 to do a rolling estimate of the covariance matrix. I’ve seen the rolling regression and stochastic volatilty examples, but can’t quite figure out how to generalize this to N stocks and do the covariance matrix.

Any guides on this?

junpenglao · November 16, 2018, 8:48pm

@aseyboldt and @twiecki have done some related work.

mmargenot · November 16, 2018, 10:29pm

Hey newquant,

I’ve done a bit of this using the GaussianRandomWalk prior. I haven’t posted the full notebook anywhere yet, but I put together a talk on the work that you can find here.

The basic idea is that I construct a lower diagonal matrix where each element is an exponentiated GaussianRandomWalk and use that to create the covariance matrix, which I use as input to a Multivariate Normal likelihood. Hope this helps!

Cheers,
Max Margenot

newquant · November 17, 2018, 9:12am

Thanks Max! I will check this talk out

newquant · November 20, 2018, 2:31pm

Hi Max. I was able to get an implementation of this working, but it seems to go extremely slow! I’m talking > 1 hour for n_secs = 4 and 3 time segements. Using Metropolis is much faster, but it doesn’t seem to really converge. Any tips on speed up?

cmorgan · November 20, 2018, 3:57pm

Hi newquant, i’m trying to get started with PyMC3 too - could you post your workings so far to Github?

newquant · November 20, 2018, 4:22pm

This is what I have going at the moment

gist.github.com

https://gist.github.com/newquant1/28277edc6713cc92ff9e01e751ac1c3b

bayes_covar.py

import pymc3 as pm
import os
import pandas as pd
import numpy as np
from theano import theano, scalar, tensor as tt
from tqdm import tqdm

def _expand_chol(n, var, od):

    sq = [[0.]*n for _ in range(n)]

This file has been truncated. show original

mmargenot · November 26, 2018, 2:56pm

Looking at your code, I would increase the subsample_rate. A default of 1 gives you a huge number of variables, due to the GaussianRandomWalk, which slows the sampling down a lot. I found some success in speeding up in the 20-30 range.

Having a subsample rate of around 21 also makes intuitive sense with a rolling covariance estimate, as it essentially gives you the monthly covariances (21 trading days in a month).

Another concern with low subsample_rate is an accompanying low number of samples for each likelihood (at the extreme case of subsample_rate=1, only one sample each).

newquant · November 26, 2018, 3:21pm

That makes sense, and I did find it faster with a higher subsample rate. And while monthly rebalancing / 21 trading days in a month makes sense for many firms, there are other firms (including the one I work for) that operate on much higher frequencies with daily (if not more frequent) rebalancing. So I wonder what the solution is for that use case.

Thanks for the help

mmargenot · November 26, 2018, 5:33pm

If you’re trying to rebalance daily, I would see if you can get your hands on intraday, minutely data to work with. A single observation per day with daily data isn’t going to help you out a lot. I think I managed to get it down to training in around 20 minutes for ~11 assets, with a subsample rate of 30 (daily data), so it might be feasible to do an hourly or two-hourly rebalance.

However, as it is currently parameterized, it definitely has speed problems. Increasing the number of assets just adds too many interrelations for the problem to be tractable as laid-out. There may be a better parameterization than the one that I have worked on, but I haven’t dug particularly deep into it.

I’m also not entirely convinced that having a variable Bayesian covariance is more beneficial than having a static Bayesian covariance with variable volatilities. It might be worth placing a prior on the lower diagonal of the covariance that is not indexed by time (such as an LKJ prior) and keeping the diagonal as a GaussianRandomWalk. I suspect that having the distribution of the lower diagonal in conjunction with variable volatilities might give you better performance while reducing the number of variables that NUTS has to sample.

newquant · November 26, 2018, 5:49pm

^ This is all very insightful. Thanks

Topic		Replies	Views
Multivariate version of Rolling Regression Example v5	4	557	January 26, 2023
[PyMC3] Non-centered parameterization for rolling regression Questions	2	677	January 3, 2020
Rolling Regression with Time Blocks Questions	1	502	April 22, 2018
Multivariate Stochastic Volatility: Implementing a Time-Dependent Covariance? Questions	3	518	January 15, 2021
Vector Autoregressions? Questions	4	1197	January 2, 2020

Getting started with rolling co-variance matrix estimate?

Related topics