Minibatch and NUTS

erlendd · June 10, 2018, 10:38pm

Am I correct in saying that the minibatch mode doesn’t work with NUTS?

narendramukherjee · June 11, 2018, 1:03am

I am actually not sure if plain minibatching would work with any MCMC method for that matter. Detailed balance would likely be messed up if different parts of the data are exposed to the sampler at different iterations.

That being said, what can definitely work is a step-by-step updating of the posterior with small subsets of the data. As in, starting with a prior, you expose a small subset of the data to get an intermediate posterior. This intermediate posterior then becomes your prior when the next subset of data is seen, and so on. The problem with this in practice would be to actually specify the intermediate posterior as a prior - it is hard to do so without making any assumptions. One potential way would be to assume that the new prior lies in the same family as the original prior that was specified - then you can take some sort of moments of the intermediate posterior and specify the new prior with that. I do not know if pymc3 has some easy way of doing this, there was a similar discussion with ADVI here:

cshenton · June 11, 2018, 1:34am

There is some work in the literature on mini-batch MCMC but suffice to say you’d need to write your own inference algorithm there. Within PyMC your best bet for a large data set is minibatch variational inference.

What I was referring to in the linked post is something more like Streaming Variational Bayes. However that’s not available in any of the major probabilistic programming languages. The problem with SVB is that the inner loop (run on each mini batch) requires convergence of your optimiser, which in practice is not as fast as just training an SGD optimiser against the entire dataset.

Can you share some details about the model you’re trying to fit?

junpenglao · June 11, 2018, 5:43am

Speaking of this, I remember I had this twitter conversation with Dan Simpson last year:
https://twitter.com/trailofdan/status/920961099863871489

Also, pseudomarginal methods tend to take a geometrically ergodic algorithm and make it no longer geometrically ergodic. And not GE = not useful because there’s no central limit theorem. (That’s not strictly true. But it’s almost true. Most of the time you don’t luckily land in one of the bigger classes of non-GE Markov chains that still satisfy a CLT)

The paper recommend were: Noisy Markov Chain Monte Carlo: [1403.5496] Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels, Some theoretical ideas are hiding here: [1205.6857] Coupled MCMC with a randomized acceptance probability

erlendd · June 11, 2018, 11:44am

I don’t see why subsampling woukd break detailed balance, surely as long as individual moves are reversible then we’re ok in this regard?

The system is a hierarchical logistic regression model. I have a lot of data (about 200k samples, a single random-effect i.e. grouping variant and quite a few fixed effects). Right now I’m using ADVI, but I’m aware that it ignores correlations in the posterior distbn.

nmrobert · June 11, 2018, 12:01pm

Hi erlendd - have you looked at FullRankADVI, or normalizing flows to add structure to your posterior? You can do something like NFVI with some number of householder transforms (plus scale/loc) if you don’t want to go all the way to FullRankADVI (or if you want something even more complex).

erlendd · June 14, 2018, 10:05am

Actually I’ve always had problems with FullRankADVI - it runs and about 1/3 of the way through starts giving NaNs.

nmrobert · June 14, 2018, 10:37am

I’ve found that reducing the learning rate and/or examining the scaling of the parameters can help with the -inf/NaN. Agreed in general though - it often goes off track, and that’s why I tend to use NFVI /w some number of HH transforms to move between totally correlation-free posteriors and full rank ones.

junpenglao · June 14, 2018, 11:15am

Do you have some examples of using NFVI? The other day I was just discussing with @ferrine that we dont have a good practical example of such.

nmrobert · June 14, 2018, 6:43pm

Hi @junpenglao, I don’t have any illustrative examples that would be really ideal for teaching people about the areas that NVFI is useful in. Mostly just really domain specific use-cases :S Sorry.

Often I find myself just using it in the situation kinda like the OP - where I want to use ADVI, but FullRank is unstable and I anticipate only needing a ‘bit’ of correlation (in some hazy, unspecified way), or I feel wild and adventurous and want to try using planar flows to modal bimodality (note: this never really works). I imagine a good tutorial might have to be some kinda standard problem (GLM? Hierarchical model?) that we engineer back-to-front to be mildly ill-suited to vanilla advi, and walk people through the differing complexity of approximations available and show the way it moves us toward a more exact inference like NUTS.

colcarroll · June 14, 2018, 6:58pm

This is off base from the main discussion here, but I did a few basic experiments of my own for mean-field vs full-rank ADVI earlier this year: https://gist.github.com/ColCarroll/d673a3af7169bd713bcbdb9445d4a543

Dan Lee, and then Dan Simpson, hopped in a discussion about it as well, which I found helpful: https://twitter.com/colindcarroll/status/967078763384201216

gBokiau · June 22, 2018, 8:04pm

Re: Nan’s with FullRank. I think this is because the cholesky decomposition is not stabilised in the current implementation. A notebook with a failing case like that would be great to let me assert that is indeed the issue.

Topic		Replies	Views
Updating prior with posterior while using minibatches Questions	0	670	October 15, 2018
Feeding Posterior back in as Prior (updating a model) Development	3	2882	August 27, 2017
ADVI result systematically different to NUTS Questions	2	646	January 29, 2020
Dealing with Unbalanced Data with Minibatches Questions	16	3590	February 4, 2020
Posterior sample from an approximation with Minibatches? Questions	9	971	February 6, 2018

Minibatch and NUTS

Related topics