I am trying to fit a Multivariate Gaussian Random Walk model to a collection of time series (about 2000 series with 30 time points each).
Right now, it is projecting 20 hours for the sampling to complete. I wonder if I can speed it up by using more cores on my machine.
From what I understand, each new core launches an additional chain. Right now I am using 8 cores to sample 8 chains with pm.sample(1000, tune=1000, cores=8)
. If I shorten the chains and run all 16 cores, e.g., pm.sample(500, tune=500, cores=16)
, would I collect the same amount of data faster? Or it doesn’t quite work that way?
As a follow-up to that, a lot of my timeseries data are right-censored. I noticed that the more missing data I keep in my dataset the slower the sampling gets. Is this expected behavior?
Any advice would be greatly appreciated. Thanks!