Multivariate-multinomial logistic regression

I’m not sure you should do that: you’re forcing a decision boundary onto the posterior proportions of your model and trying to simulate new observations, yl, with y_pred2 = np.argmax(θ_mean, axis=1). But if you want that, just take posterior predictive samples of yl and plot them raw. That way, your predictions will include the all the uncertainty present in your model – while your current approach doesn’t take into account the sampling uncertainty in the likelihood.

That’s what I do in my approach, but I go one step further and reverse-engineer the latent proportions associated with the new, predicted data – that’s what post_checks["yl"].mean(0) does. This takes into account the whole uncertainty and lets you understand how confident the model is for each category by seeing how the proportions split up by category.
Note that the average is computed on the axis 0, which is the axis of the number of samples, not the axis of the categories. So, we’re computing the frequencies within each category, not across them.

Hope this is clearer :slight_smile: