In the docs example, the final output is modeled via a Bernoulli sample. How should I go about this regarding the multi-class classifier? Should I use a container of Bernoulli samples, one for each class, or some other distribution?
+1 to using Categorical likelihood. If you search for Categorical regression or multinomial regression here you should find a few topics discussed you can use as inspiration.
I think I got what I was doing wrong, but need to test a bit. My last layer was outputting 16 variables instead of 3, which would be the probabilities per classes.
Nevertheless, the NN doesn’t learn anything. The loss keeps parkhouring from nan to inf.