Step method runtime comparisons

Hi all,

I am seeking to compare NUTS sampler to Metropolis for a problem of mine whereby I run NUTS for 5000 samples and Metropolis for 5000 samples to illustrate that, as expected, for the same number of samples NUTS shows a stronger convergence. However due to the nature of my problem with some tough geometries to sample from (I probably will try and reparameterise but that is a separate thing :sweat_smile:), NUTS naturally spends more time calculating an optimal trajectory such that NUTS takes hours while Metropolis takes a minute or two. Now I know this is to be expected but in a naive python code like I have used where you just set the number of samples for PyMC to run to it feels like surely a fairer comparison (from an application perspective) would be to run HMC for x number of samples and then let Metropolis run for the same amount of time to then compare results?

I am posting this here partly in case I have missed a function for profiling in this way, but partly in case it could be a good discussion to have perhaps for pymc_experimental. I am aware that there are a lot of model specific catches and caveats that cause variation in NUTS runtime (sometimes lots) that would be relevant to mention for anyone trying to naively use a strict runtime comparison but still I am curious to get input!