Running on GPU has a large overhead because the sampler are in Python (CPU) but the gradient and the logp function is under GPU (via theano). So unless you have very large matrix operations with expensive gradient computation that could leverage the GPU performance, using GPU is usually not worth.