To get into the weeds a little, HMC (and NUTS) simulates some dynamics where energy is conserved. The integrator is very good, but also can be (arbitrarily) bad. Luckily, you can just check how much the energy has changed from the initial conditions to tell how bad the integrator has done. We then do a Metropolis-Hastings correction using this energy (where the probability of acceptance is exp(-energy_change)) so that it is valid MCMC (i.e., converges to the stationary distribution).
A “divergence” is just when the energy_change is more than 1,000. That indicates that the integrator did very very poorly (note that exp(-1000) == 0 in float64, so the transition has no chance of being accepted). But a divergence is not being defined mathematically here.
Anyways, yeah, I think that probably if you start at a point and the sampled momentum sends your group_means negative, then the gradients might get weird and give you weird numbers, but hard to tell without checking.