Hi,
The basic state of my model is, for some M by N matrix A:
x = pm.Normal('x', 0, 1, shape=N)
combined = pm.math.dot(A, x)
scaled = pm.math.sigmoid(combined)
outcomes = pm.Bernoulli('outcome', scaled, observed=Y)
pm.fit(n=50000)
This works great as-is, however it’s getting extremely slow to perform inference over as the size of my data grows. I saw the pm.Minibatch class and some documentation around it and it looked great, however if I change this to
Y_batch = pm.Minibatch(Y, batch_size=100)
x = pm.Normal('x', 0, 1, shape=N)
combined = pm.math.dot(A, x)
scaled = pm.math.sigmoid(combined)
outcomes = pm.Bernoulli('outcome', scaled, observed=Y_batch, total_size=Y.shape)
pm.fit(n=50000)
I get the error
Input dimension mis-match. (input[0].shape[0] = 100, input[1].shape[0] = 10000)
I’m guessing this is because here the observed RV scaled
is not a scalar which we’ve observed however many times, but rather a vector with a one-to-one correspondence to rows in Y. Is there anyway to iterate through this in minibatches?
edit: Actually testing with larger datasets I also am sometimes getting the error
The current approximation of RV `x`.ravel()[8051] is NaN.
at every index of x