I’m working on a time series model with many parameters. When I call pm.sample()
with the default args, ADVI “converges” after about 20k steps, and then HMC runs at about 1.4 iterations/second. It looks like the callback for ADVI is
cb = [
pm.callbacks.CheckParametersConvergence(
tolerance=1e-2, diff='absolute'),
pm.callbacks.CheckParametersConvergence(
tolerance=1e-2, diff='relative'),
]
When I use pm.ADVI()
and a callback with a callback of pm.callbacks.CheckParametersConvergence(diff='absolute', tolerance=1e-6)
, ADVI will run for upwards of 400k samples. Loss improves slowly after 100k, but it still improves.
Three questions:
- Is it a bad idea to let ADVI run for longer before switching to HMC? In other words, is there a good reason that the callback in
init_nuts
is what it is? - If the answer to 1. is ‘No, you should try running ADVI for longer to see if it speeds up your HMC’, is there an easier way to do this than what I’m planning to do, which is to mimic the code in this block?
- I suspect HMC is slow because some of my latent variables are correlated. In my model, there are two possible ‘causes’ for some data points - think of a state space model where a point could be a jump in state vs an additive outlier. Are there best practices/recommended reading for this situation?
Thanks in advance.