Softmax regression. Theano `softmax` function slow?

I am playing around with a softmax regression. The sampling is very slow. Following the suggestion in the FAQ, I profiled the code and I realized that the softmax function in Theano may be the bottleneck.

lass
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  45.0%    45.0%       1.780s       2.19e-03s     C      811       1   theano.tensor.nnet.nnet.Softmax
  17.0%    62.0%       0.672s       8.28e-04s     Py     811       1   theano.tensor.subtensor.AdvancedIncSubtensor
  16.5%    78.6%       0.654s       1.47e-05s     C    44605      55   theano.tensor.elemwise.Elemwise
   6.1%    84.7%       0.241s       4.24e-05s     C     5677       7   theano.tensor.blas.Dot22
   4.4%    89.1%       0.174s       5.36e-05s     C     3244       4   theano.tensor.blas.Dot22Scalar
   2.4%    91.5%       0.096s       1.18e-04s     C      811       1   theano.tensor.nnet.nnet.SoftmaxGrad
...

Do you have any tips on how to speed up the computation?