Increasing sampling speed using multiple cores

I am trying to fit a Multivariate Gaussian Random Walk model to a collection of time series (about 2000 series with 30 time points each).

Right now, it is projecting 20 hours for the sampling to complete. I wonder if I can speed it up by using more cores on my machine.

From what I understand, each new core launches an additional chain. Right now I am using 8 cores to sample 8 chains with pm.sample(1000, tune=1000, cores=8). If I shorten the chains and run all 16 cores, e.g., pm.sample(500, tune=500, cores=16), would I collect the same amount of data faster? Or it doesn’t quite work that way?

As a follow-up to that, a lot of my timeseries data are right-censored. I noticed that the more missing data I keep in my dataset the slower the sampling gets. Is this expected behavior?

Any advice would be greatly appreciated. Thanks!