ADVI result systematically different to NUTS

Hi folks,
I posted yesterday and was helped to get a nice hierarchical model that measures the probability of some event occurring across multiple instances. I’d like to expand this to hundreds of thousands of instances, requiring minibatches. As I understand it, this requires variation inference, i.e. ADVI.

Before trying minibatch, I’ve implemented my model that works well using NUTS and compared it to ADVI. FullRankADVI consistently does not capture the correct proportion. Does anyone have any experience or advice here to point to what’s going wrong?

The example code is short but you can just see this gist that includes the relevant traceplots:

Many thanks!

PS. I should add Im only interested in the group parameter here. So it may also work to sequentially fit ~500 instances at a time by iterative prior/posterior updates a la - and bootstrap the large dataset - but I’m still interesting in using ADVI if possible.

Hi @lewiso1!

In general, ADVI isn’t guaranteed to identify parameters correctly, or even produce the same results as MCMC. I like to refer to Betancourt’s SBC paper (specifically section 6.3), which shows that ADVI can fail even for a simple model like linear regression. I’ve linked the paper below.


Thanks @_eigenfoo, that’s a nice example.

If anyone wanted to see the non-minibatch way of doing this (i.e. iteratively running a whole new model on ~500 samples at a time, but with the priors being informed by the previous run of the model), see: