How to make Minibatch for multi-dimensional data?

One thing that puzzled me was that multiple pm.Minibatch objects are created, because they would then have to sync their batch generations between the multiple objects “behind the scenes”.

Your point about some of the behavior of multiple Minibatch streams is well taken; PyMC3 affords the user less granular control than Theano and some details are not as obvious.

Also, would it be better to use pm.Minibatch with a multi-dimensional numpy array? I suppose the batch-generator should be initialized as follows. But how do I get that data into the pymc3 model in the example above?

With regard to the multdimensional minibatch, you can create the minibatch variable inside the model and then index into it like so:

with pm.Model() as model:
    
    batch = pm.Minibatch(data=data, batch_size=[(128, 2)])
    X_batch = batch[:,0]
    Y_batch = batch[:,1]

I’m not sure if there is a performance difference between using pm.Minibatch twice and creating it once and then indexing later but it may be something worth testing.

Note that creating the pm.Minibatch objects generate a Python warning when using pymc3 v. 3.8 (latest). Is this something to worry about?

I’m not completely sure about this. You can read some discussion other users have had about it here but it appears to be a harmless Theano warning for now.