I posted yesterday and was helped to get a nice hierarchical model that measures the probability of some event occurring across multiple instances. I’d like to expand this to hundreds of thousands of instances, requiring minibatches. As I understand it, this requires variation inference, i.e. ADVI.
Before trying minibatch, I’ve implemented my model that works well using NUTS and compared it to ADVI. FullRankADVI consistently does not capture the correct proportion. Does anyone have any experience or advice here to point to what’s going wrong?
The example code is short but you can just see this gist that includes the relevant traceplots:
PS. I should add Im only interested in the group parameter here. So it may also work to sequentially fit ~500 instances at a time by iterative prior/posterior updates a la https://docs.pymc.io/notebooks/updating_priors.html - and bootstrap the large dataset - but I’m still interesting in using ADVI if possible.