Gumbel-Softmax version of Bernoulli and Categorical distributions

torch has a gumbel_softmax too

https://pytorch.org/docs/stable/_modules/torch/nn/functional.html

They have an epsilon. Maybe they clamp the y_i to avoid division by zero?