So I am reading Bayesian Data Analysis (BDA3) and I want to know why the effective sample size is so low for this model. The example is taken from chapter 3 (Example. Pre-election polling).
In late October, 1988, a survey was conducted by CBS News of 1447 adults in the United States to find out their preferences in the upcoming presidential election. Out of 1447 persons, y_1 = 727 supported George Bush, y_2 = 583 supported Michael Dukakis, and y_3 = 137 supported other candidates or expressed no opinion. Assuming no other information on the respondents, the 1447 observations are exchangeable. If we also assume simple random sampling (that is, 1447 names ‘drawn out of a hat’), then the data (y_1, y_2, y_3) follow a multinomial distribution, with parameters (\theta_1, \theta_2, \theta_3), the proportions of Bush supporters, Dukakis supporters, and those with no opinion in the survey population. An estimand of interest is \theta_1 - \theta_2, the population difference in support for the two major candidates.
Ok, let’s put this in code using pymc3
. First, import the required libraries.
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
The data.
obs = np.array([727, 583, 137])
bush_supp = obs[0] / sum(obs)
dukakis_supp = obs[1] / sum(obs)
other_supp = obs[2] / sum(obs)
arr = np.array([bush_supp, dukakis_supp, other_supp])
print('The proportion array is', arr)
print('The supporters array is', obs)
And now the model
with pm.Model() as model_3:
theta1 = pm.Uniform('theta1', lower=0, upper=1)
theta2 = pm.Uniform('theta2', lower=0, upper=1)
theta3 = pm.Uniform('theta3', lower=0, upper=1)
# or theta1 = pm.Uniform('theta1', lower=0, upper=1, shape=3)
post = pm.Multinomial('post', n=1447, p=[theta1, theta2, theta3], shape=3, observed=obs)
diff = pm.Deterministic('diff', theta1 - theta2)
diff
is what I am looking for. And now, sample.
with model_3:
trace_3 = pm.sample(draws=50_000, tune=10_000)
Look at the draws, it will be important later. The trace plot is
and the summary is
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
theta1 0.752658 0.185972 0.007974 0.393953 0.999931 324.360295 1.009122
theta2 0.607154 0.153961 0.006487 0.300763 0.850772 333.936984 1.009070
theta3 0.143408 0.038012 0.001552 0.066682 0.206904 357.452126 1.008408
diff 0.145503 0.050030 0.001494 0.048851 0.237703 705.530788 1.003811
Look at the effective sample size, 50000 draws and only 330 are used. How is that possible? And the number of divergences…
According to the book, this is a simple example of a multinomial model.
So my questions are:
- Can this be improved? The mean of
diff
is 0.14 and the book says it is close to 0.10. - Am I doing something wrong here? The book says With a noninformative uniform prior distribution on θ and I want to keep that instruction, although if anybody has a better prior… of course I will add it.