I have the following Deep Learning model:
X = theano.shared(X_train)
Y = theano.shared(np.where(y_train==1)[1])
X_off = theano.shared(X_offset_train)
h = [n_features, 10, 10, 1]
inits = []
for i in range(len(h)-1):
inits.append(np.random.randn(h[i], h[i+1])*
np.sqrt(2/(h[i] + h[i+1])))
with pm.Model() as dl_model:
ws = []
logit = X
for i in range(len(h)-1):
ws.append(pm.Normal('w{0}'.format(i), 0, sd=1,
shape=(h[i], h[i+1]),
testval=inits[i]))
w_repeat = pm.math.block_diagonal([ws[-1]]*max_horse)
logit = tt.nnet.relu(tt.dot(logit, w_repeat))
p = tt.nnet.softmax(logit + X_off)
out = pm.Categorical('out', p, observed=Y)
During the posterior inference process I get a “FloatingPointError: NaN occurred in optimization” error. The optimization is invoked via the following code block.
batch_size = 256
X_m = pm.Minibatch(X_train, batch_size)
X_offset_m = pm.Minibatch(X_offset_train, batch_size)
Y_m = pm.Minibatch(np.where(y_train==1)[1], batch_size)
with dl_model:
approx = pm.fit(100000,
more_replacements={X:X_m, X_off: X_offset_m, Y:Y_m},
callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])
The question is how can I go about debugging what values caused this error. Am I somehow able to run the last minibatch through this so that I can get the log-likelihood via the out
variable? If not any tips on what may have gone wrong. None of my training data have NaNs in them.
Edit: I should have mentioned that this occurs at random iteration numbers. Last time it ran it was at iteration 28091/100000 before it stopped.
Edit 2: Turns out the solution in this case was not to have an activation in the last layer. So I had to have this in the model:
logit = tt.dot(logit, w_repeat) + b_repeat
if i < len(h) - 2:
logit = tt.tanh(logit)