It seems theano does not having the same capability as tensorflow for softmax. The theano version only allows a matrix input. I have N x T matrices. Each matrix has S x S dimension.
How have people been getting around missing this low level function? Do you do all the sigmoid, reduce sum, and divisions one step at a time?