# of ADVI samples before HMC; Sampling with correlated latent vars

I’m working on a time series model with many parameters. When I call pm.sample() with the default args, ADVI “converges” after about 20k steps, and then HMC runs at about 1.4 iterations/second. It looks like the callback for ADVI is

    cb = [
            tolerance=1e-2, diff='absolute'),
            tolerance=1e-2, diff='relative'),

When I use pm.ADVI() and a callback with a callback of pm.callbacks.CheckParametersConvergence(diff='absolute', tolerance=1e-6), ADVI will run for upwards of 400k samples. Loss improves slowly after 100k, but it still improves.
Three questions:

  1. Is it a bad idea to let ADVI run for longer before switching to HMC? In other words, is there a good reason that the callback in init_nuts is what it is?
  2. If the answer to 1. is ‘No, you should try running ADVI for longer to see if it speeds up your HMC’, is there an easier way to do this than what I’m planning to do, which is to mimic the code in this block?
  3. I suspect HMC is slow because some of my latent variables are correlated. In my model, there are two possible ‘causes’ for some data points - think of a state space model where a point could be a jump in state vs an additive outlier. Are there best practices/recommended reading for this situation?

Thanks in advance.

The number of iteration in the default setting is enough for most of our test case. If you want to increase the iteration, you should mimic the code block from pm.sampling - currently there is no easy way to change it.

In general, ADVI initialization is good to some extent: the problem is that it underestimates the variance which just results in the subsequent NUTS with small step size. So it is hard to say for sure if running ADVI longer would improve NUTS.

If you have correlated latent variables, you should try to reparameterized the model. Maybe you can try modelling the additive outlier cases as a state as well?

Thanks for the reply!
Regarding re-parameterization: as far as I can tell from reading the literature on this problem (time series with both structural breaks and additive outliers), there’s no way around the correlated variables, and the solution is specialized sampling algorithms (e.g. this paper).