NUTS uses all cores

Yeah from Multidimensional gaussian process - #4 by bwengals

The matrix operations used by Theano here are multithreaded, so running multiple chains simultaneously bogs things down.

I am not sure how to limit it to single thread per chain tho.