Just using NUTS

I want to have a fair comparison between different mcmc methods, particularly, I want to compare the sampling performance of my method with NUTS. However, I think the sampling function in pymc3 has lots of settings/tuning which improve the performance of the sampling algorithms beyond the original methods.

So, I was wondering if anyone could tell me how to disable those tunings. In short, I just want to have a simple NUTS algorithm (by NUTS, I mean Algorithm 6: Efficient No-U-Turn Sampler with Dual Averaging)

Thanks

If you do something like:

step = pm.NUTS()
trace = pm.sample(step=step, tune=0)

It basically turn off all the tuning and just use the bare bone implementation.
However, I think over the years the NUTS sampler improves in many small details and the implementation might not be exactly the same as Algorithm 6 on the paper any more.

Thanks a lot for your help.

I’m not sure a comparison like that makes much sense. If you disable tuning of step sizes and mass matrix you don’t get a “bare bones” NUTS, you get a broken something that can’t even sample from a Normal(0, 0.01) anymore.

with pm.Model():
    pm.Normal('a', sd=0.01)
    pm.sample(1000, tune=0)

This returns only diverging samples, because the step size is so large that the integration error is always very high. I guess if you really wanted, you could disable the mass matrix adaptation, but in this case the performance of nuts suddenly starts to depend a lot on scaling factors in your model. Depending on the value of val you should get quite different performance for example:

val = 1

with pm.Model():
    pm.Normal('a', sd=np.array([0.01, val]), shape=2)
    diag = np.ones(2)
    potential = pm.step_methods.hmc.quadpotential.QuadPotentialDiag(diag)
    step = pm.NUTS(potential=potential)
    trace = pm.sample(1000, tune=500, step=step, njobs=1)

Thanks,
But, for my understanding, NUTS is an adaptive method by itself and Algorithms 6 of the NUTS paper tunes both L and epsilon (step size). Then, could please tell me why it needs an external algorithm for adjusting step size ?
I have done some tests with “junpenglao” suggestion; indeed, it worked very well.

Thanks again for your help :slight_smile:

Setting tune to 0 disables the step size adaptation from the paper. Without step size adaptation we just use the initial guess (which only depends on the dimension). Bad step sizes can lead to arbitrarily inefficient sampling and there is no reason to assume that the initial guess in in any way reasonable.
It has also been known long before nuts was written, that the mass matrix is important for hamiltonian methods. I don’t see what a comparison would tell you if you disable one of the features that makes nuts useful in the first place.
We do not implement the exact algorithm from the paper anymore, there have been some changes since. We take into account the energy change when sampling from the tree for example.

Edit: I forgot to pass in step=step into pm.sample in the last post.

hmmmm,
So, if I set the tune to something non-zero, I will get an adaptive algorithm, am I right ?
Then it is fine, I can compare my method with your adaptive NUTS.

Thanks again for your quick response :slight_smile:

So, if I set the tune to something non-zero, I will get an adaptive algorithm, am I right?

yes

No problem :slight_smile:

I wasted a lot of time chasing noisy estimates for the number of effective samples, you might want to avoid that hell by using a lot of draws when you compute pm.effective_n. And double check that you have enough so that you get stable estimates for effective n. :slight_smile: