I want compare Widely-applicable Information Criterion (WAIC) of linear regression models that was fitted using Automatic Differentiation Variational Inference (ADVI) and MCMC samping.
Since WAIC computation is coupled with the trace, I used sampled from the ADVI approximation and then used the resulting trace as the input for the WAIC calculation.
However, I get interesting results, where WAIC(MCMC) << WAIC(ADVI). Is this accurate?
WAIC and LOO is in general use for compare across a set of models. That said, I would expect MCMC fits better and gives a lower WAIC value. I will suggest to check your model fitting and make sure both inference converge.
Good point about convergence @junpenglao. I do know that with large data sets, ADVI is more appropriate compared to MCMC approaches. So if the ADVI converges, but the MCMC struggles b/c it can’t handle the data, ADVI would give a better fit and thus lower WAIC than MCMC.
That is good observation @tobib. Now I understand what could happen.
Can you please let me know, that what did you mean by large datasets (is it number of samples or number of dimensions)? Are there any other scenarios where MCMC struggles but ADVI can converge faster?
@junpenglao can we measure the convergence with respect to the time (or number of iterations/samples) in PyMC3?
If you have a lot of observation, NUTS could take a long time due to the evaluation of logp and dlogp, but in ADVI you can do minibatch which converge faster in terms of raw speed. But NUTS tend to give much better answer than ADVI, as shown in the paper above.
There is no good indication of convergence in real problem (you can argue that even in ADVI when the ELBO reach a plateau it doesnt necessary means its converged), so it is hard to measure the speed in that regard.