Inference with multi-dimensional data and minibatches

seanbow · March 12, 2020, 8:54pm

Hi,

The basic state of my model is, for some M by N matrix A:

x = pm.Normal('x', 0, 1, shape=N)
combined = pm.math.dot(A, x)
scaled = pm.math.sigmoid(combined)
outcomes = pm.Bernoulli('outcome', scaled, observed=Y)
pm.fit(n=50000)

This works great as-is, however it’s getting extremely slow to perform inference over as the size of my data grows. I saw the pm.Minibatch class and some documentation around it and it looked great, however if I change this to

Y_batch = pm.Minibatch(Y, batch_size=100)
x = pm.Normal('x', 0, 1, shape=N)
combined = pm.math.dot(A, x)
scaled = pm.math.sigmoid(combined)
outcomes = pm.Bernoulli('outcome', scaled, observed=Y_batch, total_size=Y.shape)
pm.fit(n=50000)

I get the error

Input dimension mis-match. (input[0].shape[0] = 100, input[1].shape[0] = 10000)

I’m guessing this is because here the observed RV scaled is not a scalar which we’ve observed however many times, but rather a vector with a one-to-one correspondence to rows in Y. Is there anyway to iterate through this in minibatches?

edit: Actually testing with larger datasets I also am sometimes getting the error

 The current approximation of RV `x`.ravel()[8051] is NaN.

at every index of x

Topic		Replies	Views
How to make Minibatch for multi-dimensional data? Questions	10	2522	September 17, 2020
Minibatch when latent variable size depends on data dimension Questions	2	682	February 8, 2019
Several minibatch parameters Questions	10	918	June 25, 2018
Minibatch Theano: shape mismatch: Questions	0	415	May 25, 2020
Minibatch variables not behaving as expected Questions	11	1547	March 14, 2018

Inference with multi-dimensional data and minibatches

Related topics