Multivariate-multinomial logistic regression

Hey Alex,
Your have explained very neatly the various ways to do predictive posterior checks in your NB on multinomial regression. I implemented some aspects of it in my code and have the follow-up questions for you.

  1. I have 4 features in my model and I ran the counter_fact plots for each of the parameter in turn while holding others at mean=0. Below is the plot for parameter-1.


    For all the four cases I get an accuracy of 66%. I guess the lower accuracy is due to my features. Either there is some correlation in my features or I have inadequate number of features for undertaking the regression. I tried changing the std dev for the priors alfa and beta and that didn’t help as NUTS had no convergence problems with std dev = 1.0. Also, my data does not equally represent the three types (typ-1, typ-2 and typ-3) of observations that I’m trying to predict. I have 60% of data for type-1, 25% for type-2 and 15% for type-3. Could this also be contributing to the lower accuracy?

  2. Next I ran ppc on 1800 raw data points (with default 20,000 sampling from the posterior). The plot of the mean of the 20,000 predictions against the observations (1800 observations) is as below. As you can see the predictions are really bad - the mean of the predictions is between type-1 (0) and type-2 (1). What could be the reason for this?

Any help or suggestions are welcome