How to make Minibatch for multi-dimensional data?

ckrapu · May 8, 2020, 5:53pm

One thing that puzzled me was that multiple pm.Minibatch objects are created, because they would then have to sync their batch generations between the multiple objects “behind the scenes”.

Your point about some of the behavior of multiple Minibatch streams is well taken; PyMC3 affords the user less granular control than Theano and some details are not as obvious.

Also, would it be better to use pm.Minibatch with a multi-dimensional numpy array? I suppose the batch-generator should be initialized as follows. But how do I get that data into the pymc3 model in the example above?

With regard to the multdimensional minibatch, you can create the minibatch variable inside the model and then index into it like so:

with pm.Model() as model:
    
    batch = pm.Minibatch(data=data, batch_size=[(128, 2)])
    X_batch = batch[:,0]
    Y_batch = batch[:,1]

I’m not sure if there is a performance difference between using pm.Minibatch twice and creating it once and then indexing later but it may be something worth testing.

Note that creating the pm.Minibatch objects generate a Python warning when using pymc3 v. 3.8 (latest). Is this something to worry about?

I’m not completely sure about this. You can read some discussion other users have had about it here but it appears to be a harmless Theano warning for now.

Topic		Replies	Views
Minibatch not working v5 bug	11	356	October 2, 2024
How to set up minibatches on one specific dimension when variables have multiple and different dimensions v5 modeling	13	811	March 3, 2023
Minibatch when latent variable size depends on data dimension Questions	2	676	February 8, 2019
pm.Minibatch Doc string v5	1	26	May 24, 2025
Verifying that minibatch is actually randomly sampling version agnostic	17	253	March 12, 2025

How to make Minibatch for multi-dimensional data?

Related topics