Gpu much slower than cpu

That’s plausible. is this overhead consistent with the slowing down after the NUTS % counting started? I mean its not just in the beginning of the program execution that suffers from the cpu sampler overhead.