Haven’t run the code yet but have you gotten any warnings alongside the output (that you should get if you have the latest version of PyMC)? Specifically a maximum tree depth warning, you can get the tree depth by putting this at the end of the code:
print(trace.sample_stats['tree_depth'])
which will return an array so if you have lots of 9s (for example) in the array then that means there are 2^9 branches of this tree, put simply is that it is spending a lot longer calculating the hamiltonian trajectory of each sampling step and so the sampling takes longer.
Just my preliminary thought before diving in since that is something that is affecting my own model at the moment! Have you done any other profiling on this code that you can provide, I think PyMC has a profiling function (though I have yet to use it) which could provide insights into where the computational expense is going?
Edit: I’m running the code now so I can look at the tree depth bit