(newbie) using Dirichlet for partial pooling for likert-like survey responses

I think there might be confusion because the hyperparameter pop_prefs are over the concentration parameter of the Dirichlet prior. I could be mistaken, but I don’t think a concentration parameter \alpha_i gives the percentage share of draws from the distribution that will be of class i (although it might be possible to recover this percentage with some algebra). Furthermore, the Dirichlet alphas aren’t bounded between 0-1, so I don’t know if modeling them as coming from a Dirichlet makes sense.

I tried running your model again, using a hierarchical normal prior over the multinomial probability, which I then put into a softmax to ensure everything added up to 1 row-wise. I figure this will make everything easier to interpret, because we only have to reason about the softmax, and not about the Dirichlet distribution.

options_idx, options = pd.factorize(df.columns)
respondents_idx, respondent = pd.factorize(df.index)
n_respondents, n_options = df.shape

coords = {'options':options, 'respondents':respondents}

with pm.Model(coords=coords) as model:
    pop_mu = pm.Normal('pop_mu', mu=0.0, sigma=1.0, dims=['options'])
    pop_sigma = pm.HalfCauchy('pop_sigma', beta=0.8, dims=['options'])
    respondent_offset = pm.Normal('respondent_offset', mu=0.0, sigma=1.0, dims=['respondents', 'options'])
    respondent_prefs = pm.Deterministic('respondent_prefs', pop_mu + pop_sigma * respondent_offset, dims=['respondents', 'options'])
    
    responses = pm.Multinomial(
        "responses", 
        n=df_n.values, 
        p=pm.math.softmax(respondent_prefs, axis=1), 
        dims=['respondents', 'options'],
        observed=df
    )
    
    prior = pm.sample_prior_predictive()

This sampled without divergences (with adapt_diag_grad and target_accept=0.9), although so did your original model (with default settings) when I tried it, so idk. Anyway, I get the following estimates for the population averages:

                   mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
pop_mu[STRONG_A] -0.759  0.544  -1.794    0.242      0.020    0.014     722.0    1501.0   1.00
pop_mu[A]        -0.486  0.483  -1.378    0.419      0.020    0.014     599.0    1250.0   1.00
pop_mu[NO_PREF]  -0.427  0.501  -1.354    0.520      0.019    0.013     693.0    1244.0   1.01
pop_mu[B]         0.428  0.462  -0.411    1.304      0.019    0.013     611.0    1121.0   1.01
pop_mu[STRONG_B]  1.048  0.469   0.182    1.943      0.019    0.014     585.0    1164.0   1.00

To recover estimated population proportions, I put the means into a softmax and compare with the data:

print((df.mean() / df.mean().sum())
     .to_frame()
     .rename(columns={0:'data_proportion'})
     .assign(estimated_proportion = softmax(az.summary(trace, var_names=['pop_mu'])['mean']).values)
     .to_string())

Which gives:

          data_proportion  estimated_proportion
STRONG_A         0.089908              0.076470
A                0.100917              0.100474
NO_PREF          0.124771              0.106580
B                0.238532              0.250610
STRONG_B         0.445872              0.465866

These seem reasonable, so I feel like it’s on the right track. Again, there might be a way to do it by interpreting the Dirichlet concentrations, but I would need to hit the books to figure it out.

3 Likes