Multivariate Time Series very slow with sampling

Hi, I was hoping for some help figuring out why this model is so slow. It only samples ~2 per second, so it takes a very long time to generate anything useful.

Here is the code. There are 30 teams each with an index in the dataset and the values of the observations are interger valued. This is supposed to be a state space model, which is why the latent variable is modeled as an AR1 (though bc. it’s multivariate I had to use AR rather than AR1).

with pm.Model() as fg3m_AR:
      comb_off_coef = pm.Normal('comb_off_coef',mu = 1, sigma = 20, shape = (1,30) )
      comb_def_coef = pm.Normal('comb_def_coef',mu = 1, sigma = 20, shape = (1,30))

      comb_off_ar1 = pm.Bound(pm.AR,lower = 0.0)('comb_off_ar1',rho = comb_off_coef, sigma = 10, shape = (81,30))
      comb_def_ar1 = pm.Bound(pm.AR,lower = 0.0)('comb_def_ar1',rho = comb_def_coef, sigma = 10, shape = (81,30))

      comb_sd_star = pm.Gamma("comb_sd_star", alpha = 1/3,beta = 1/9, shape = 30)

      home_theta = pm.Deterministic('home_theta',comb_off_ar1[fg3m.GAME_NUMBER_x.values,fg3m.i_home.values] 
                              + comb_def_ar1[fg3m.GAME_NUMBER_y.values,fg3m.i_away.values])

      away_theta = pm.Deterministic('away theta',comb_off_ar1[fg3m.GAME_NUMBER_y.values,fg3m.i_away.values] 
                              + comb_def_ar1[fg3m.GAME_NUMBER_x.values,fg3m.i_home.values]

      y_home = pm.Normal('y_home',mu = home_theta, sd = comb_sd_star[fg3m.i_home.values] , observed = fg3m.FG3M_x.values)

      y_away = pm.Normal('y_away',mu = away_theta, sd = comb_sd_star[fg3m.i_away.values], observed = fg3m.FG3M_y.values)

Hey, Aditya_Kaushik. While giving a full explanation would take very long, I used to wonder the same thing myself and so I will save you a lot of time by just letting you know that Pymc3 does not do time series very well at the moment.

There is no inherent reason for this, other than that A) reparametrizing time-series efficiently is a difficult thing, and I believe Stan has much better time-series functionality purely because they accomplish this specific aspect better, and B) the samplers in use were not built for time-series to begin with (unlike SMC, which Pymc3 recently incorporated, but still needs some more work to handle time series as well as it hypothetically can), with reparametrizing only being a stop-gap to not having a dedicated sampler.

These are some things I picked up over time, but I may be wrong, and I hope other users/developers maybe can lend their opinions on the topic ^^; .

There are other threads around this forum, such as in here, that demonstrate similar findings. I wish you the best of luck!

1 Like

Hi Gon_F, thank you very much for this response. You have saved me a lot of future headache. Would you recommend Stan in python/R for the type of analysis I want to perform then? Or are there libraries that you think are better.

For Bayesian time-series analysis, I would really only recommend Stan at the moment, there is not much else that is as flexible, and also fast enough to actually work. Stan takes more effort to work in, but there are enough examples on what you want to accomplish.

I haven’t actually play with the time series components in Stan, do they have some special API?

Otherwise, Bayesian state space model in tensorflow probability is really good in terms design (you can do almost every kind of time series using that) however it is currently a bit slow for large timeseries.

Stan does not have any special API for them, but based on what I’ve read, they seem to get better results for more complex models.

Not to hijack the thread here, but I wonder how much faster if you were to write this in Stan. To me the problem is more the priors - more informative prior would likely help.