Just using NUTS

I’m not sure a comparison like that makes much sense. If you disable tuning of step sizes and mass matrix you don’t get a “bare bones” NUTS, you get a broken something that can’t even sample from a Normal(0, 0.01) anymore.

with pm.Model():
    pm.Normal('a', sd=0.01)
    pm.sample(1000, tune=0)

This returns only diverging samples, because the step size is so large that the integration error is always very high. I guess if you really wanted, you could disable the mass matrix adaptation, but in this case the performance of nuts suddenly starts to depend a lot on scaling factors in your model. Depending on the value of val you should get quite different performance for example:

val = 1

with pm.Model():
    pm.Normal('a', sd=np.array([0.01, val]), shape=2)
    diag = np.ones(2)
    potential = pm.step_methods.hmc.quadpotential.QuadPotentialDiag(diag)
    step = pm.NUTS(potential=potential)
    trace = pm.sample(1000, tune=500, step=step, njobs=1)