Comparing ADVI and MCMC models using WAIC or LOO

I want compare Widely-applicable Information Criterion (WAIC) of linear regression models that was fitted using Automatic Differentiation Variational Inference (ADVI) and MCMC samping.

Since WAIC computation is coupled with the trace, I used sampled from the ADVI approximation and then used the resulting trace as the input for the WAIC calculation.

However, I get interesting results, where WAIC(MCMC) << WAIC(ADVI). Is this accurate?

My code :

with pm.Model() as model:
    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
    intercept = pm.Normal('Intercept', 0, sd=20)
    x_coeff = pm.Normal('x', 0, sd=20)

    mu = intercept + x_coeff * x

    # Define likelihood
    likelihood = pm.Normal('y', mu=mu, sd=sigma, observed=y)

    # Inference!
    trace = pm.sample(3000, njobs=1) 
    approx =

waic_mcmc = pm.waic(trace, model)
waic_advi = pm.waic(approx.sample(3000), model)

Is it possible to calculate WAIC for ADVI using the method I followed?

If it is possible, and the WAIC scores are accurate, then how can we interpret such observations?

I appreciate any help to understand these observations.


If you are trying to compare the fitting of the same model between MCMC and VI, you should consult this new paper:, (with blog post discussion here

WAIC and LOO is in general use for compare across a set of models. That said, I would expect MCMC fits better and gives a lower WAIC value. I will suggest to check your model fitting and make sure both inference converge.

1 Like

This is great. The paper describes something similar to what I wanted.

Thanks a lot junpenglao :smile:

Good point about convergence @junpenglao. I do know that with large data sets, ADVI is more appropriate compared to MCMC approaches. So if the ADVI converges, but the MCMC struggles b/c it can’t handle the data, ADVI would give a better fit and thus lower WAIC than MCMC.

That is good observation @tobib. Now I understand what could happen.

Can you please let me know, that what did you mean by large datasets (is it number of samples or number of dimensions)? Are there any other scenarios where MCMC struggles but ADVI can converge faster?

@junpenglao can we measure the convergence with respect to the time (or number of iterations/samples) in PyMC3?

If you have a lot of observation, NUTS could take a long time due to the evaluation of logp and dlogp, but in ADVI you can do minibatch which converge faster in terms of raw speed. But NUTS tend to give much better answer than ADVI, as shown in the paper above.

There is no good indication of convergence in real problem (you can argue that even in ADVI when the ELBO reach a plateau it doesnt necessary means its converged), so it is hard to measure the speed in that regard.

1 Like

@junpenglao thanks for helping me to understand.

@Nadheesh When I said large data sets, that would be in any sense - number of samples, dimensions, or anything else that causes the data set to take up lots of space. I would refer to page 335 and maybe page 337 in BDA3 for more detail.

1 Like

thanks @tobib. I’ll refer the book.