I’m not sure a comparison like that makes much sense. If you disable tuning of step sizes and mass matrix you don’t get a “bare bones” NUTS, you get a broken something that can’t even sample from a Normal(0, 0.01) anymore.
with pm.Model():
pm.Normal('a', sd=0.01)
pm.sample(1000, tune=0)
This returns only diverging samples, because the step size is so large that the integration error is always very high. I guess if you really wanted, you could disable the mass matrix adaptation, but in this case the performance of nuts suddenly starts to depend a lot on scaling factors in your model. Depending on the value of val you should get quite different performance for example:
val = 1
with pm.Model():
pm.Normal('a', sd=np.array([0.01, val]), shape=2)
diag = np.ones(2)
potential = pm.step_methods.hmc.quadpotential.QuadPotentialDiag(diag)
step = pm.NUTS(potential=potential)
trace = pm.sample(1000, tune=500, step=step, njobs=1)