I’m using a multi-output latent coregional GP with a Poisson likelihood to model a collection of time series and the idea is to be able to model the individual behaviour of each time series but also capture some latent behaviour. Tests with reduced datasets are yielding interesting results but I’m experiencing performance issues to scale it. I know that GPs are heavy on computation requirements and don’t scale nice (being O(n3)). Even so, I would expect to be able to get them to a certain scale. Right now, I am modeling more or less 50 time series (which I would like to grow to the few hundreds), each one with roughly 100 points (I’m focused on short time series, so I’m not expecting to get much larger than this).
I can’t really sample using MCMC (would take days from my attempts). I can only use MAP to perform some tests, but it is not really what I would want. I already tried to use ADVI, but the results were not good.
One thing that I did was to ensure that my priors were as informative as I can have them to be. I also already checked some of the posts here and understood that, for now, there is no current solution to use a sparse approximation with latent GPs.
Does someone has any ideas that could help to increase the model performance?
What is the shape of
B in your Coregional GP?
I was just running the model for 12 time series. Considering this number, I defined a (12,12) matrix for W and a vector of size 12 for kappa. Looking at the
PyMC3 source code, the B matrix is expanded based on my X, which is, in this case, (576,2). Something along the lines:
X = np.concatenate((np.repeat(np.arange(n).reshape(-1,1), s, axis=1).flatten('F').reshape(-1,1), np.repeat(np.arange(s),n).reshape(-1,1)), axis=1)
W = np.random.normal(loc=0, scale=0.5, size=(12,12))
kappa = stats.gamma.rvs(a=2, size=12)
W = tt.as_tensor_variable(W)
kappa = tt.as_tensor_variable(kappa)
B = tt.dot(W, W.T) + tt.diag(kappa)
index = tt.cast(X[:,1], "int32")[:,None]
index2 = index.T
B = B[index, index2]
m = plt.imshow(B.eval(), cmap="inferno", interpolation='none')
So, its shape is (576,576).
Yep the shape should be in the range of what pymc3 could handle - i would check to make sure there is no undesired broadcasting happening during the computation of the likelihood.
I will recheck everything.
What would be the size where things get hairy?
Hi @junpenglao, I made it work but now I am applying it to a dataset with 304 time series with 220 time points. So B is now (66880, 66880). It does not seem to be doable.
Is there any other way to approach this? For instance, instead of flattenning the data, use it in matrix form and estimate the GP parameters specifically for each time series (avoiding the need to build such a big matrix full of zeros). The dependencies could also be estimated in different GPs by group and added to the former. Does it make sense? Any way to use the GP API?
It might not be a very realistic assumption that every time point in your 304 time series are correlated with each other, instead, how about modeling a single (meta) time series with GP, and another correlation matrix to capture the correlation among time series (but each time slice will share the same correlation in that case)
And to hop on to @junpenglao’s response, I think you could try the Kronecker functionality for that. I think it should be computationally tractable using that.
You are right @junpenglao, but modeling a meta time series is also not a realistic assumption for my problem. So I went for a hybrid solution, modeling a GP for each hierarchy that I have in the data. It works well and it is fast for the small dataset (30+ time series). It is also fast enough for the bigger one (304 time series) if I don’t use all the hierarchies. If I add one of the bigger ones (with 30 and 70+ groups on each one - meaning 30 and 70+ new GPs added to the model) and try find_MAP it just stays hanging endlessly. There is no error, it just won’t start. One core of the CPU stays at 100% (others at zero) and it uses 12Gb of RAM more or less (which is not bad considering the size of the model but now we also have much smaller cov matrices). Do you know what could be happening?
@bwengals thanks for the tip, I had that as a next step as soon as I could manage to implement the above solution with much smaller cov matrices to begin with. I was thinking of using it for the high level hierarchies with a smaller number of groups, since I am a bit afraid of running again into giant cov matrices. Does it make sense?