The multinomial logit is logit with an additional dimension, so think of it as running multiple logit and stack the prediction p_1, p_2, ... together. Your matrix multiplication should be the same as before, with the additional dimension at the end.
You can use different hyperpriors for different betas, but if your betas has the same number of rows as your observation, your model is usually overspecify and performs badly.
If you have multiple observation per person, you should add additional hierarchy to the model, similar to mixed effect model. For example, if you have information of subject nsbj < N, you can add:
with model:
s = pm.HalfStudentT('sd_1', nu=3, sd=186)
b = pm.Normal('b', mu = 0, sd = 1, shape=nsbj)
r_1 = pm.Deterministic('r_1', s*b)
p = T.nnet.softmax(mu + r_1[sbj_index])