One thing that puzzled me was that multiple
pm.Minibatch
objects are created, because they would then have to sync their batch generations between the multiple objects “behind the scenes”.
Your point about some of the behavior of multiple Minibatch
streams is well taken; PyMC3 affords the user less granular control than Theano and some details are not as obvious.
Also, would it be better to use
pm.Minibatch
with a multi-dimensional numpy array? I suppose the batch-generator should be initialized as follows. But how do I get that data into the pymc3 model in the example above?
With regard to the multdimensional minibatch, you can create the minibatch variable inside the model and then index into it like so:
with pm.Model() as model:
batch = pm.Minibatch(data=data, batch_size=[(128, 2)])
X_batch = batch[:,0]
Y_batch = batch[:,1]
I’m not sure if there is a performance difference between using pm.Minibatch
twice and creating it once and then indexing later but it may be something worth testing.
Note that creating the
pm.Minibatch
objects generate a Python warning when using pymc3 v. 3.8 (latest). Is this something to worry about?
I’m not completely sure about this. You can read some discussion other users have had about it here but it appears to be a harmless Theano warning for now.