Minibatch for MAP and/or wide models?

kyleabeauchamp · August 27, 2017, 7:06pm

I’d like to use minibatch to train some models that essentially look like an ‘outer product’ of two vectors (respective shapes n, m) where the observation is a giant sparse matrix (Poisson).

The dimensions m and n are very large, so I can’t hold the whole thing in memory.

Has anyone gotten minibatch to work for simple MAP estimation? I imagine the find_map() function won’t work here and we’d need a MAP estimator that uses the variational API with, e.g., SGD.
Do we have any good examples of using minibatch approaches on “wide” models? My intuition is that results will be highly sensitive to the batching shape in such models.

kyleabeauchamp · August 27, 2017, 11:56pm

One idea I had is to somehow use hierarchical parameters to model the “mean field” effect of the parameters and observations that are “not in memory” at a given moment. It’s not obvious to me how to do the modeling and also how to do the inference with the machinery that we have in place currently.

junpenglao · August 28, 2017, 10:21am

If you have hyper-prior in your model for the two vectors, then it should have the property of modeling the effect of the parameters and observations that are “not in memory” at a given moment.

However, I am not sure if it is easy to use the minibatch to index the random sample as you will have a different batch size for the two vectors right? Maybe it is easier to creat your own minibatch generator (example here).

aseyboldt · August 31, 2017, 3:52pm

I don’t understand your setup yet. What exactly is so large that it doesn’t fit in memory? If it is just the observations, then you could also move the likelihood function in a special op, that calls to mpi or so to do the computation in a distributed manner.
I guess a minibatch approach would be faster, but I don’t think we have a minibatch algo for find_map yet (not sure how difficult it would be to add this)

kyleabeauchamp · August 31, 2017, 4:06pm

It is primarily the observations that need minibatching. However, the model has intermediate parameters with large dimensions (e.g. outer products of (m) and (n) vectors) that are also big and benefit from minibatching.

aseyboldt · August 31, 2017, 4:11pm

I see. If you want to do that with minibatches, then you’d probably have to do some work on your own. Maybe @ferrine has an idea how we could reuse some of the optimisation code from advi for find_MAP?
Have you tried to avoid computing the outer product? In many cases you can get around that by using associativity of matrix multiplications.

Topic		Replies	Views
Inference with multi-dimensional data and minibatches Questions	0	307	March 12, 2020
Minibatch w/ Gaussian Processes Questions	0	383	September 15, 2021
Variational inference over cartesian product of large sets of observations Questions	5	435	February 8, 2019
Reusing a Minibatch object for several models Questions	1	343	August 14, 2020
GPs & minibatch: shape mismatch? Questions	1	471	May 21, 2021

Minibatch for MAP and/or wide models?

Related topics