
About pymc3.plots.energyplot(), how should I judge based on the plot, telling if it explored the marginal energy distribution extremely efficiently? Or I should just use trace.get_sampler_stats? Does that mean the plot just gives an intuitive idea of the exploration?

Is there a default value for the parameter potential in NUTS sampling method originally?
About NUTS sampling and energy plot

The best explanation on the energy plot is in Michael Betancourt’s A Conceptual Introduction to Hamiltonian Monte Carlo: see Fig 23, Fig 34 and the related text. In brief you want the two distribution (energy and energy transition) as close as possible. If you have the energy transition distribution (
diff(Energy)
) much more narrow than energy distribution, it means you dont have enough energy to explore the whole parameter space and your posterior estimation is likely biased. 
I dont think you should understand the potential as one value, in that sense there is no default value in the original NUTS or HMC, but the original default setting is to generate velocity from a standard gaussian (you dont actually generate potential directly). All modern implementation in PyMC3 and Stan adapt multiple parameters to archive optimal performance.
…it means you dont have enough energy to explore the whole parameter space and your posterior estimation is likely biased.
In this case what can we do to improve the exploration? (Excuse me if it is a too basic question. )
Thanks a lot for sharing the paper. It’s really clear. There is a video from this author too. https://www.youtube.com/watch?v=VnNdhsm0rJQ
There is no good answer to that question because you usually dont directly control the potential. Usually you should improve your model by trying different reparameterization, standardizing the scale of the predictors, and use more informative prior to cut out the part of the parameter space that might be problematic (avoid heavy tail prior, etc).