The large number of divergences implies that you should ignore the posterior you are currently getting. I increased the number of tuning samples and the target acceptance rate and that eliminated the divergences.
trace = pm.sample(tune=2000, target_accept=.9)
However, it doesn’t change the estimates much.
If you inspect the summary of respondent_prefs, you should see that they are pretty close to the individual rows of your data. But the hyperparameter pop_prefs are not close to the grand averages as you note. You are correct that pop_prefs characterizes the preferences of the population, but does not reflect the grand means. Why? Because each individual respondent doesn’t respond in way consistently reproduces the population means (i.e., you have heterogeneity within your sample). Imagine we have only 2 respondents. If one respondent only produces STRONGLY_PREFER_A responses and a second respondent only produces STRONGLY_PREFER_B responses, what should we expect the posterior of the hyperparameter to be? Half the responses we see are STRONGLY_PREFER_A. However, half of our respondents produced it with probability 1.0 and half of them produced it with probability 0.0. So we should be very uncertain about the population-level population of having a STRONGLY_PREFER_A tendency. Specifically, we should not be convinced that the population parameter associated with a STRONGLY_PREFER_A response is definitely near 0.5. To confirm this, you can look at the full posterior rather than the means produced by az.summary(). Specifically, you can call:
az.plot_posterior(trace, var_names=["pop_prefs"])
Which shows a posterior that seems consistent with a large degree of uncertainty about the population (hyper)parameters:
This is also why duplicating your data many times doesn’t change things much at the population level. The heterogeneity among your respondents is roughly the same when you simply cut and paste your 12 respondents over and over.
