Multinomial with Random Effects on panel data (shape issues)

I am working on a hierarchical multinomial model with partial pooling (over some common hyper priors). Basically something similar to a multinomial with random effect on panel data.

My dataset has 4 predictors (x1,x2,x3,x4) and the y has 3 possible states (0,1,2).

Furthermore the dataset has N total individuals (N=10930) for which I am trying to forecast the different betas, and in a simple logit model I am able to map the right betas to the right person in the “mu” variable using an array that maps every row in my dataset to their “owner”.

Now my approach is something like this:

model = pm.Model()
with model:
    mu_common_prior = pm.Normal('common_mu', mu=0., sd=100**2, shape=(4, 3))
    sd_common_prior = pm.HalfCauchy('common_sd', 5, shape=(4, 3))
    beta = pm.Normal('beta', mu=mu_prior, sd=sd_prior, shape=(N, 4, 3))

    mu =[individual_indexes], df[['x1', 'x2', 'x3', 'x4']].values)

    p = T.nnet.softmax(mu)
    y = pm.Categorical('y', p=p, observed=df['y'].values)
    res =, callbacks=[CheckParametersConvergence()])
    trace = res.sample(1000)

but unfortunately the call fails miserably complaining about the shapes of my data.

shapes (43720,3) and (10930,4) not aligned: 3 (dim 1) != 10930 (dim 0)

any thoughts on what is wrong with my model? thanks

So you need p and mu has a shape of (N, 3), since your predictors df[['x1', 'x2', 'x3', 'x4']].values has a shape of (N, 4), you need a beta with a shape of (4, 3) to get there.

What I would do in this case:

with pm.Model() as model:
    mu_common_prior = pm.Normal('common_mu', mu=0., sd=100**2)
    sd_common_prior = pm.HalfCauchy('common_sd', 5)
    beta = pm.Normal('beta', mu=mu_prior, sd=sd_prior, shape=(4, 3))

    mu =[['x1', 'x2', 'x3', 'x4']].values, beta)

    p = T.nnet.softmax(mu)
    y = pm.Categorical('y', p=p, observed=df['y'].values)

Thanks for the reply. However I am not entirely sure it works; given that the formulation of a multinomial logit is

y = \beta_{x_i,k_i} \cdot x_i

  1. I am not 100% sure that we can use the same hyper prior over betas of different output states (in your model \beta_{x1,k0} shares the same hyperprior of \beta_{x1,k1})
  2. How are you giving information about the panel nature of the data to the model? I am afraid that if you model it this way we are simply losing the “panel” level (i.e. multiple observations per person).

The multinomial logit is logit with an additional dimension, so think of it as running multiple logit and stack the prediction p_1, p_2, ... together. Your matrix multiplication should be the same as before, with the additional dimension at the end.

You can use different hyperpriors for different betas, but if your betas has the same number of rows as your observation, your model is usually overspecify and performs badly.

If you have multiple observation per person, you should add additional hierarchy to the model, similar to mixed effect model. For example, if you have information of subject nsbj < N, you can add:

with model:
   s = pm.HalfStudentT('sd_1', nu=3, sd=186)
   b = pm.Normal('b', mu = 0, sd = 1, shape=nsbj)
   r_1 = pm.Deterministic('r_1', s*b)
   p = T.nnet.softmax(mu + r_1[sbj_index])