Adaptive Minibatch size


I’m using ADVI with Minibatches and I observed two things.

One is that using small batch_size values I get to a relatively good convergence quickly, whereas with large batch sizes it takes long.

The other thing is that the variance of the elbo plot using small batch sizes is relatively large compared to bigger batch sizes. This is actually creating convergence problems for me, as the parameters fluctuate too much so that CheckParametersConvergence does not detect convergence.

So my idea is to use an adaptive batch size for Minibatches which starts with a small batch size to get to a good estimate for the parameters quickly and increase the batch size to get stable estimates later on.
Do you think this approach makes sense?

I tried to figure out how to change the batch_size of a pm.Minibatch, and what I came up with makes me wonder if there is a better way of doing this

def change_minibatch_size(minibatch, size):
    minibatch.minibatch.owner.inputs[1].owner.inputs[0].owner.inputs[0] \
        .owner.inputs[0].owner.inputs[0].owner.inputs[1].data[0] = size

I got to this solution after looking at the graph:


ViewOp [id A] 'Minibatch'   
 |AdvancedSubtensor1 [id B] ''   
   |<TensorType(int64, vector)> [id C]
   |Reshape{1} [id D] ''   
     |Elemwise{Cast{int64}} [id E] ''   
     | |Elemwise{add,no_inplace} [id F] ''   
     |   |Elemwise{mul,no_inplace} [id G] ''   
     |   | |mrg_uniform{TensorType(float64, vector),no_inplace}.1 [id H] ''   
     |   | | |<TensorType(int32, matrix)> [id I]
     |   | | |TensorConstant{(1,) of 10} [id J]
     |   | |InplaceDimShuffle{x} [id K] ''   
     |   |   |Elemwise{sub,no_inplace} [id L] ''   
     |   |     |UndefinedGrad [id M] ''   
     |   |     | |Elemwise{sub,no_inplace} [id N] ''   
     |   |     |   |Elemwise{Cast{float64}} [id O] ''   
     |   |     |   | |Subtensor{int64} [id P] ''   
     |   |     |   |   |Shape [id Q] ''   
     |   |     |   |   | |<TensorType(int64, vector)> [id C]
     |   |     |   |   |Constant{0} [id R]
     |   |     |   |TensorConstant{1e-16} [id S]
     |   |     |UndefinedGrad [id T] ''   
     |   |       |Elemwise{Cast{float64}} [id U] ''   
     |   |         |TensorConstant{0.0} [id V]
     |   |InplaceDimShuffle{x} [id W] ''   
     |     |UndefinedGrad [id T] ''   
     |MakeVector{dtype='int64'} [id X] ''   
       |Subtensor{int64} [id Y] ''   
         |Shape [id Z] ''   
         | |Elemwise{Cast{int64}} [id E] ''   
         |Constant{0} [id BA]

I am wondering if you can set the minibatch size as a theano.shared variable, and decay it during training, similar to the learning rate decay in

As for the correctness of doing so, I really have no intuition. Sound plausible so I will be very interested to heard back about your result.

1 Like