It could backtrack and adjust learning rate, or try a new batch (in minibatch setting). Optax for example has apply_if_finite, which would for us to have in combination with other stochastic optimization niceties like gradient clipping and learning rate scheduling