Adaptive Minibatch size

gokl · September 18, 2018, 12:27pm

Hi,

I’m using ADVI with Minibatches and I observed two things.

One is that using small batch_size values I get to a relatively good convergence quickly, whereas with large batch sizes it takes long.

The other thing is that the variance of the elbo plot using small batch sizes is relatively large compared to bigger batch sizes. This is actually creating convergence problems for me, as the parameters fluctuate too much so that CheckParametersConvergence does not detect convergence.

So my idea is to use an adaptive batch size for Minibatches which starts with a small batch size to get to a good estimate for the parameters quickly and increase the batch size to get stable estimates later on.
Do you think this approach makes sense?

I tried to figure out how to change the batch_size of a pm.Minibatch, and what I came up with makes me wonder if there is a better way of doing this

def change_minibatch_size(minibatch, size):
    minibatch.minibatch.owner.inputs[1].owner.inputs[0].owner.inputs[0] \
        .owner.inputs[0].owner.inputs[0].owner.inputs[1].data[0] = size

I got to this solution after looking at the graph:

theano.printing.debugprint(minibatch)

ViewOp [id A] 'Minibatch'   
 |AdvancedSubtensor1 [id B] ''   
   |<TensorType(int64, vector)> [id C]
   |Reshape{1} [id D] ''   
     |Elemwise{Cast{int64}} [id E] ''   
     | |Elemwise{add,no_inplace} [id F] ''   
     |   |Elemwise{mul,no_inplace} [id G] ''   
     |   | |mrg_uniform{TensorType(float64, vector),no_inplace}.1 [id H] ''   
     |   | | |<TensorType(int32, matrix)> [id I]
     |   | | |TensorConstant{(1,) of 10} [id J]
     |   | |InplaceDimShuffle{x} [id K] ''   
     |   |   |Elemwise{sub,no_inplace} [id L] ''   
     |   |     |UndefinedGrad [id M] ''   
     |   |     | |Elemwise{sub,no_inplace} [id N] ''   
     |   |     |   |Elemwise{Cast{float64}} [id O] ''   
     |   |     |   | |Subtensor{int64} [id P] ''   
     |   |     |   |   |Shape [id Q] ''   
     |   |     |   |   | |<TensorType(int64, vector)> [id C]
     |   |     |   |   |Constant{0} [id R]
     |   |     |   |TensorConstant{1e-16} [id S]
     |   |     |UndefinedGrad [id T] ''   
     |   |       |Elemwise{Cast{float64}} [id U] ''   
     |   |         |TensorConstant{0.0} [id V]
     |   |InplaceDimShuffle{x} [id W] ''   
     |     |UndefinedGrad [id T] ''   
     |MakeVector{dtype='int64'} [id X] ''   
       |Subtensor{int64} [id Y] ''   
         |Shape [id Z] ''   
         | |Elemwise{Cast{int64}} [id E] ''   
         |Constant{0} [id BA]

junpenglao · September 23, 2018, 6:29am

I am wondering if you can set the minibatch size as a theano.shared variable, and decay it during training, similar to the learning rate decay in https://docs.pymc.io/notebooks/lda-advi-aevb.html?highlight=lda#AEVB-with-ADVI

As for the correctness of doing so, I really have no intuition. Sound plausible so I will be very interested to heard back about your result.

Topic		Replies	Views
Does minibatch size affect accuracy? Questions	1	679	January 16, 2018
Average loss and MiniBatch size Questions	3	518	July 10, 2018
Several minibatch parameters Questions	10	892	June 25, 2018
ADVI Minibatch slows down with increasing size of data Questions	3	988	April 19, 2019
Minibatch when latent variable size depends on data dimension Questions	2	673	February 8, 2019

Adaptive Minibatch size

Related topics