Minimalist Potential example not acting as expected, what'd I do wrong?

The gradients increase with the norm of the potential. Generally you need to use a learning rate which is inversely proportional to the largest gradient you’re going to see, or you’ll get a divergent optimization.

NUTS under the hood is doing a gradient descent with momentum; and if you have a huge potential it will diverge just like any other solver. Try passing in

pm.sample(..., nuts_kwargs={'step_scale': 0.1/10000})

Edit: This is incorrect, see Minimalist Potential example not acting as expected, what'd I do wrong?

Though to be sure, integrating a Hamiltonian system with too large a step size relative to the gradient has the same numerical problems as optimizing function with too large a step size relative to the gradient norm. I guess the tuning step works really really well at defining the size…