The basic state of my model is, for some M by N matrix A:
x = pm.Normal('x', 0, 1, shape=N) combined = pm.math.dot(A, x) scaled = pm.math.sigmoid(combined) outcomes = pm.Bernoulli('outcome', scaled, observed=Y) pm.fit(n=50000)
This works great as-is, however it’s getting extremely slow to perform inference over as the size of my data grows. I saw the pm.Minibatch class and some documentation around it and it looked great, however if I change this to
Y_batch = pm.Minibatch(Y, batch_size=100) x = pm.Normal('x', 0, 1, shape=N) combined = pm.math.dot(A, x) scaled = pm.math.sigmoid(combined) outcomes = pm.Bernoulli('outcome', scaled, observed=Y_batch, total_size=Y.shape) pm.fit(n=50000)
I get the error
Input dimension mis-match. (input.shape = 100, input.shape = 10000)
I’m guessing this is because here the observed RV
scaled is not a scalar which we’ve observed however many times, but rather a vector with a one-to-one correspondence to rows in Y. Is there anyway to iterate through this in minibatches?
edit: Actually testing with larger datasets I also am sometimes getting the error
The current approximation of RV `x`.ravel() is NaN.
at every index of