The gradients may be very costly to compute, but you shouldn’t compare nuts vs metropolis+slice based on runtime but on effective sample size. It doesn’t matter if Metropolis is 3 times as fast if it gives you 6x less effective samples.
About your questions, no need to wrap in tensor variables, that’s down automatically by PyMC already.