Comparing ADVI and MCMC models using WAIC or LOO

Nadheesh · March 29, 2018, 5:11am

I want compare Widely-applicable Information Criterion (WAIC) of linear regression models that was fitted using Automatic Differentiation Variational Inference (ADVI) and MCMC samping.

Since WAIC computation is coupled with the trace, I used sampled from the ADVI approximation and then used the resulting trace as the input for the WAIC calculation.

However, I get interesting results, where WAIC(MCMC) << WAIC(ADVI). Is this accurate?

My code :

with pm.Model() as model:
    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
    intercept = pm.Normal('Intercept', 0, sd=20)
    x_coeff = pm.Normal('x', 0, sd=20)

    mu = intercept + x_coeff * x

    # Define likelihood
    likelihood = pm.Normal('y', mu=mu, sd=sigma, observed=y)

    # Inference!
    trace = pm.sample(3000, njobs=1) 
    approx = pm.fit(1000)

waic_mcmc = pm.waic(trace, model)
waic_advi = pm.waic(approx.sample(3000), model)

Is it possible to calculate WAIC for ADVI using the method I followed?

If it is possible, and the WAIC scores are accurate, then how can we interpret such observations?

I appreciate any help to understand these observations.

Thanks

junpenglao · March 29, 2018, 5:50am

If you are trying to compare the fitting of the same model between MCMC and VI, you should consult this new paper: https://arxiv.org/abs/1802.02538, (with blog post discussion here http://andrewgelman.com/2018/02/07/eid-ma-clack-shaw-zupoven-del-ba/).

WAIC and LOO is in general use for compare across a set of models. That said, I would expect MCMC fits better and gives a lower WAIC value. I will suggest to check your model fitting and make sure both inference converge.

Nadheesh · March 29, 2018, 6:06am

This is great. The paper describes something similar to what I wanted.

Thanks a lot junpenglao

tobib · March 29, 2018, 8:45pm

Good point about convergence @junpenglao. I do know that with large data sets, ADVI is more appropriate compared to MCMC approaches. So if the ADVI converges, but the MCMC struggles b/c it can’t handle the data, ADVI would give a better fit and thus lower WAIC than MCMC.

Nadheesh · March 30, 2018, 3:59am

That is good observation @tobib. Now I understand what could happen.

Can you please let me know, that what did you mean by large datasets (is it number of samples or number of dimensions)? Are there any other scenarios where MCMC struggles but ADVI can converge faster?

@junpenglao can we measure the convergence with respect to the time (or number of iterations/samples) in PyMC3?

junpenglao · March 30, 2018, 8:28am

If you have a lot of observation, NUTS could take a long time due to the evaluation of logp and dlogp, but in ADVI you can do minibatch which converge faster in terms of raw speed. But NUTS tend to give much better answer than ADVI, as shown in the paper above.

There is no good indication of convergence in real problem (you can argue that even in ADVI when the ELBO reach a plateau it doesnt necessary means its converged), so it is hard to measure the speed in that regard.

Nadheesh · April 2, 2018, 4:26am

@junpenglao thanks for helping me to understand.

tobib · April 2, 2018, 5:44pm

@Nadheesh When I said large data sets, that would be in any sense - number of samples, dimensions, or anything else that causes the data set to take up lots of space. I would refer to page 335 and maybe page 337 in BDA3 for more detail. https://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-Science/dp/1439840954

Nadheesh · April 3, 2018, 4:16am

thanks @tobib. I’ll refer the book.

Topic		Replies	Views
Poor Accuracy of ADVI for Linear Regression Questions	12	3373	April 18, 2018
WAIC values differing between identical fits? Questions	0	512	July 22, 2019
Unexpected Results when using WAIC to Compute Log Predictive Accuracy Questions	3	672	May 13, 2018
Using ADVI for GMM Questions	3	630	June 28, 2018
Collecting ELBO from ADVI Questions	14	1458	April 5, 2018

Comparing ADVI and MCMC models using WAIC or LOO

Related topics