Why doesn't pymc.Multinomial support logit_p parameterization?

DrEntropy · May 13, 2025, 3:31pm

pm.Categorical supports the logit_p parameterization, which improves numerical stability by avoiding an explicit softmax when computing the log-likelihood (I presume it uses a stable log-sum-exp internally instead). The aggregated version, pm.Multinomial, currently does not support a similar logit_p argument.

Shouldn’t pm.Multinomial also benefit from the same numerical stability advantages if implemented using logits internally—computing the likelihood directly using a numerically stable log-sum-exp rather than explicitly performing a softmax and passing normalized probabilities (p)?

Or am I missing something that fundamentally prevents this numerical stability benefit in the multinomial case?

ricardoV94 · May 13, 2025, 8:12pm

PyTensor probably already does the optimization you have in mind if you do the explicit softmax.

For the Bernoulli case using logit_p=x or doing p=inverse_logit(x) is exactly the same. Both cases are optimized numerically even if the user doesn’t know about logit_p. It’s there just for convenience. We can add the same convenience for Multinomial, I think there’s a GitHub issue for that

DrEntropy · May 13, 2025, 8:32pm

Oh that’s awesome! I was wondering why I had not seen any issues at all using the softmax and passing in p. I keep forgetting that this is all symbolic. I really need to sit down and dig into pytensor and how it works. \text{Someday}^{tm}!

bob-carpenter · May 15, 2025, 7:42pm

I find this terminology confusing because the arguments to pm.Categorical are not logit probs, they’re log probs. If you start with the probability simplex (vector of non-negative values that sum to 1), then apply logit to it, then apply softmax, you don’t get the original probabilities. If you start with the probability simplex then apply log to it, then apply softmax, you get back to where you started. So what are being called logics by everyone everywhere are not really logits. I have no idea where the notation that they were logits comes from, but it’s so strong, I misnamed Stan’s categorical_logit function—it should be categorical_log. I explain more fully in this blog post: https://statmodeling.stat.columbia.edu/2024/12/26/those-are-unnormalized-log-probabilities-not-logits-in-your-neural-networks-final-layer/

If you accept log probs as an argument, then you don’t need to do softmax or log-sum-exp, but if you need to do error checking, it’s that the arguments have a log-sum_exp of 0. If you accept unnormalized log probs as input, the operation you actually need for the categorical and multinomial is log(softmax(x)) = x - log_sum_exp(x).

Topic		Replies	Views
Problem with pm.Categorical Questions	4	4241	December 8, 2017
PR proposal -- API for Multinomial-Softmax and Cholesky decomposition Development development	2	598	November 11, 2019
Why my multinomial model with categorical predictors and response differs so much from results in R? version agnostic modeling	5	92	January 26, 2025
Multinomial logistic regression on conjoint choice data v5 scan_ops , modeling , aesara , pytensor	6	1430	April 22, 2024
Custom likelihood for multinomial model v5 modeling	0	435	July 3, 2023

Why doesn't pymc.Multinomial support logit_p parameterization?

Related topics