Hello, I’m new to PYMC3 and PPL’s in general. I have a matrix of stock returns and I’m trying to use PYMC3 to do a rolling estimate of the covariance matrix. I’ve seen the rolling regression and stochastic volatilty examples, but can’t quite figure out how to generalize this to N stocks and do the covariance matrix.
I’ve done a bit of this using the GaussianRandomWalk prior. I haven’t posted the full notebook anywhere yet, but I put together a talk on the work that you can find here.
The basic idea is that I construct a lower diagonal matrix where each element is an exponentiated GaussianRandomWalk and use that to create the covariance matrix, which I use as input to a Multivariate Normal likelihood. Hope this helps!
Hi Max. I was able to get an implementation of this working, but it seems to go extremely slow! I’m talking > 1 hour for n_secs = 4 and 3 time segements. Using Metropolis is much faster, but it doesn’t seem to really converge. Any tips on speed up?
Looking at your code, I would increase the subsample_rate. A default of 1 gives you a huge number of variables, due to the GaussianRandomWalk, which slows the sampling down a lot. I found some success in speeding up in the 20-30 range.
Having a subsample rate of around 21 also makes intuitive sense with a rolling covariance estimate, as it essentially gives you the monthly covariances (21 trading days in a month).
Another concern with low subsample_rate is an accompanying low number of samples for each likelihood (at the extreme case of subsample_rate=1, only one sample each).
That makes sense, and I did find it faster with a higher subsample rate. And while monthly rebalancing / 21 trading days in a month makes sense for many firms, there are other firms (including the one I work for) that operate on much higher frequencies with daily (if not more frequent) rebalancing. So I wonder what the solution is for that use case.
If you’re trying to rebalance daily, I would see if you can get your hands on intraday, minutely data to work with. A single observation per day with daily data isn’t going to help you out a lot. I think I managed to get it down to training in around 20 minutes for ~11 assets, with a subsample rate of 30 (daily data), so it might be feasible to do an hourly or two-hourly rebalance.
However, as it is currently parameterized, it definitely has speed problems. Increasing the number of assets just adds too many interrelations for the problem to be tractable as laid-out. There may be a better parameterization than the one that I have worked on, but I haven’t dug particularly deep into it.
I’m also not entirely convinced that having a variable Bayesian covariance is more beneficial than having a static Bayesian covariance with variable volatilities. It might be worth placing a prior on the lower diagonal of the covariance that is not indexed by time (such as an LKJ prior) and keeping the diagonal as a GaussianRandomWalk. I suspect that having the distribution of the lower diagonal in conjunction with variable volatilities might give you better performance while reducing the number of variables that NUTS has to sample.