Why does the multinomial distribution work in this case?

Suppose I have the following model:

with pm.Model() as model_hier:
    
    packed_L = pm.LKJCholeskyCov('packed_L', n=2, eta=2., sd_dist=pm.HalfCauchy.dist(5))

    L = pm.expand_packed_triangular(2, packed_L)
    Sigma = pm.Deterministic('Sigma', L.dot(L.T))

    mu = pm.Normal('mu', 0, 20, shape=2)
    beta = pm.MvNormal('beta', mu=mu, chol=L, shape=(16, 2))
    
    alphas = pm.invlogit(beta)
    
    post = pm.Multinomial('post', n=np.sum(some_values, axis=1), p=alphas, 
                          observed=some_values)

The goal is to find mu, beta and Sigma; some_values is an array with dimension [16, 2]. Although there are many divergences, what I find weird is that multinomial works. Why? Because almost always each row of alphas (array of [16, 2]) doesn’t sum one, and the sum must be one in a multinomial distribution. It samples without any problem! (besides the divergences, of course).

So, why does the multinomial work in this case?

Hi David,
I think it’s because the Multinomial’s p is automatically rescaled under the hood. As the doc says for p (emphasis mine):

Probability of each one of the different outcomes. Elements must be non-negative and sum to 1 along the last axis. They will be automatically rescaled otherwise.

And indeed, in the source code you can see: p = p / tt.sum(p, axis=-1, keepdims=True).
So, I think that’s why your model runs despite alphas not summing to one. I wouldn’t say it “works” though, as you have many divergences. These could come from the misparametrization (you’re using logistic instead of softmax as the link function), and from your wide priors (they should probably be more regularizing).

Hope this helps :vulcan_salute: