Hmm, seems there are still additional compilation when you first call advi.fit().
Depending on which part of the speed you want to measure. Maybe if you call advi.fit(1) before just so that all loss functions are set up you can measure the raw speed of the optimization - but I am not completely sure about this.