I find this terminology confusing because the arguments to pm.Categorical are not logit probs, they’re log probs. If you start with the probability simplex (vector of non-negative values that sum to 1), then apply logit to it, then apply softmax, you don’t get the original probabilities. If you start with the probability simplex then apply log to it, then apply softmax, you get back to where you started. So what are being called logics by everyone everywhere are not really logits. I have no idea where the notation that they were logits comes from, but it’s so strong, I misnamed Stan’s categorical_logit function—it should be categorical_log. I explain more fully in this blog post: https://statmodeling.stat.columbia.edu/2024/12/26/those-are-unnormalized-log-probabilities-not-logits-in-your-neural-networks-final-layer/
If you accept log probs as an argument, then you don’t need to do softmax or log-sum-exp, but if you need to do error checking, it’s that the arguments have a log-sum_exp of 0. If you accept unnormalized log probs as input, the operation you actually need for the categorical and multinomial is log(softmax(x)) = x - log_sum_exp(x).