Any possible ways to get faster ways of pm.sample()

Certainly computing one step of the Metropolis algorithm is cheaper than computing one step of NUTS, but you can’t extrapolate this fact to a statement about relative speeds. Doing so ignores the total number of evaluations required to get a posterior estimate. For a fixed target ESS, Metropolis may (and does!) require hundreds or thousands of times more logp evaluations than NUTS, so the result is that the more computationally expensive NUTS evaluations will end up actually saving time per effective sample.