I think there might be confusion because the hyperparameter pop_prefs
are over the concentration parameter of the Dirichlet prior. I could be mistaken, but I don’t think a concentration parameter \alpha_i gives the percentage share of draws from the distribution that will be of class i (although it might be possible to recover this percentage with some algebra). Furthermore, the Dirichlet alphas aren’t bounded between 0-1, so I don’t know if modeling them as coming from a Dirichlet makes sense.
I tried running your model again, using a hierarchical normal prior over the multinomial probability, which I then put into a softmax to ensure everything added up to 1 row-wise. I figure this will make everything easier to interpret, because we only have to reason about the softmax, and not about the Dirichlet distribution.
options_idx, options = pd.factorize(df.columns)
respondents_idx, respondent = pd.factorize(df.index)
n_respondents, n_options = df.shape
coords = {'options':options, 'respondents':respondents}
with pm.Model(coords=coords) as model:
pop_mu = pm.Normal('pop_mu', mu=0.0, sigma=1.0, dims=['options'])
pop_sigma = pm.HalfCauchy('pop_sigma', beta=0.8, dims=['options'])
respondent_offset = pm.Normal('respondent_offset', mu=0.0, sigma=1.0, dims=['respondents', 'options'])
respondent_prefs = pm.Deterministic('respondent_prefs', pop_mu + pop_sigma * respondent_offset, dims=['respondents', 'options'])
responses = pm.Multinomial(
"responses",
n=df_n.values,
p=pm.math.softmax(respondent_prefs, axis=1),
dims=['respondents', 'options'],
observed=df
)
prior = pm.sample_prior_predictive()
This sampled without divergences (with adapt_diag_grad and target_accept=0.9), although so did your original model (with default settings) when I tried it, so idk. Anyway, I get the following estimates for the population averages:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
pop_mu[STRONG_A] -0.759 0.544 -1.794 0.242 0.020 0.014 722.0 1501.0 1.00
pop_mu[A] -0.486 0.483 -1.378 0.419 0.020 0.014 599.0 1250.0 1.00
pop_mu[NO_PREF] -0.427 0.501 -1.354 0.520 0.019 0.013 693.0 1244.0 1.01
pop_mu[B] 0.428 0.462 -0.411 1.304 0.019 0.013 611.0 1121.0 1.01
pop_mu[STRONG_B] 1.048 0.469 0.182 1.943 0.019 0.014 585.0 1164.0 1.00
To recover estimated population proportions, I put the means into a softmax and compare with the data:
print((df.mean() / df.mean().sum())
.to_frame()
.rename(columns={0:'data_proportion'})
.assign(estimated_proportion = softmax(az.summary(trace, var_names=['pop_mu'])['mean']).values)
.to_string())
Which gives:
data_proportion estimated_proportion
STRONG_A 0.089908 0.076470
A 0.100917 0.100474
NO_PREF 0.124771 0.106580
B 0.238532 0.250610
STRONG_B 0.445872 0.465866
These seem reasonable, so I feel like it’s on the right track. Again, there might be a way to do it by interpreting the Dirichlet concentrations, but I would need to hit the books to figure it out.