Infinite Loss in Multi-Class Bayesian Neural Network

I was playing around with a very similar example here and ran into this problem. I found that sometimes the logits coming out of the network were such that the target label had probability zero, which leads to infinities in the loss. Clipping the logits was enough to prevent this. I did pt.clip(logits, -100, 100), but I have no idea if that’s a good range (or if it matters).

3 Likes