Sounds like our derivatives may be more unstable than those in STAN… or its due to different starting points.
Can you try to start sampling at the posterior mean found in Stan (using the start kwarg in pm.sample) to see if the problem is due to a bad initialization point?