How to Interpret the different variances result from ADVI and NUTS (MCMC)

Nadheesh · May 16, 2018, 4:40am

This is more of a general question rather than specific pymc3 question. I appreciate if someone can help me to understand this.

I have observed that even though ADVI and MCMC posterior distributions have their mode at same point, their variance is significantly different. Usually MCMC has a high variance whereas ADVI have a low variance.

I know that MCMC draws samples from the exact target distribution and where as ADVI try to minimize the KL divergence between a proposed distribution and the target distribution. So is it safe to say that MCMC estimate the variance of the target distribution accurately?

narendramukherjee · May 16, 2018, 4:45am

ADVI (the mean-field version) often shows “mode-seeking” behavior, where the estimated posterior sticks to one of the modes of the real posterior. So yes, it isn’t surprising that it estimates a lower variance than MCMC. Now whether MCMC has “accurately” captured variance is a very tough question to answer - you likely have to do posterior predictive checks to get a handle on that.

That being said, doing full-rank ADVI should give you a better estimate of the posterior variance than mean-field.

junpenglao · May 16, 2018, 5:07am

It is a well-observed behaviour of ADVI. Quoting from Dan Simpsons below:

For mean-field Gaussian you’re approximating family is a product of Gaussians on the two axes, which, for example, can’t approximate a narrow Gaussian concentrated around the line y=x.

For the full rank one, I’d expect it to be in the correct place, but the covariance matrix to be too “concentrated”. This is because the KL divergence is an asymmetric measure of “distance” between two probability distributions and in the direction that it is used for VI, it penalises approximations that are too diffuse far more fiercely than approximations that are too concentrated. This leads to a systematic underestimation of variation using VB methods.

Of course, it is not saying that VI is always only mode seeding, for example see in Kevin Murphy’s Machine learning a probabilistic perspective:

Nadheesh · May 16, 2018, 6:05am

Thanks guys, this is really helpful.

Topic		Replies	Views
What are the differences between NUTS and ADVI? Questions	3	1749	July 4, 2023
Intro Bayesian Regression using HMC & ADVI Sharing	4	1552	January 18, 2019
ADVI result systematically different to NUTS Questions	2	615	January 29, 2020
Using ADVI for GMM Questions	3	630	June 28, 2018
Comparing ADVI and MCMC models using WAIC or LOO Questions	8	1866	April 3, 2018

How to Interpret the different variances result from ADVI and NUTS (MCMC)

Related topics