What exactly is `jitter+adapt_diag` and why is it the default now?

sammosummo · October 19, 2017, 1:41pm

I notice the the default initialization procedure is no longer ADVI but jitter+adapt_diag. Could anyone explain this is in simple terms and why it is preferred to initializing using ADVI?

aseyboldt · October 19, 2017, 2:04pm

With adapt_diag enabled, nuts changes the (diagonal) mass matrix during tuning to match the variance of the posterior samples so far (using sliding windows so that early nonsense doesn’t mess everything up). In cases where the solution of advi is very different from the actual posterior, this can improve mixing a great deal. Basically, adapt_diag is more robust than advi.
We also noticed that running both advi and mass matrix adaptation isn’t worth it most of the time, especially when taking into account the compilation time for advi. In some very large models this might be different, if so, you can set it to advi+adapt_diag.
If you know cases where the effective number of samples (the min of pm.effective_n(trace) divided by the total time) decreased because of this change, we’d be interested to hear about it.
The jitter implies that the start position for parameters where it isn’t specified explicitly is set to uniform(-1, 1) on the transformed space, so that different chains use different initial parameters.

sammosummo · October 19, 2017, 2:42pm

Thanks for the explanation!

I can actually test whether its worthing advi+adapt_diag because I’m running a very large model at the moment. I’ll set up another version going and report back once done.

sammosummo · November 3, 2017, 6:02pm

Turns out that for my models at least (which are huge), it is definitely worth initializing with ADVI. In fact without ADVI, I often get the “bad initial energy” error at some point during sampling.

Topic		Replies	Views
Derivatives are zero for jitter but not ADVI initialization Questions	2	546	September 28, 2018
QuadPotentialFullAdapt v5	7	427	July 2, 2022
`jitter+adapt_diag` vs `adapt_diag` Questions	3	598	October 7, 2020
Difference between 'jitter+adapt_diag' and 'adapt_diag'? Questions	2	1260	April 2, 2019
Initialization energy is NaN or Inf with jitter Questions	4	1324	December 9, 2020

What exactly is `jitter+adapt_diag` and why is it the default now?

Related topics