How to improve a multinomial model from BDA3

So I am reading Bayesian Data Analysis (BDA3) and I want to know why the effective sample size is so low for this model. The example is taken from chapter 3 (Example. Pre-election polling).

In late October, 1988, a survey was conducted by CBS News of 1447 adults in the United States to find out their preferences in the upcoming presidential election. Out of 1447 persons, y_1 = 727 supported George Bush, y_2 = 583 supported Michael Dukakis, and y_3 = 137 supported other candidates or expressed no opinion. Assuming no other information on the respondents, the 1447 observations are exchangeable. If we also assume simple random sampling (that is, 1447 names ‘drawn out of a hat’), then the data (y_1, y_2, y_3) follow a multinomial distribution, with parameters (\theta_1, \theta_2, \theta_3), the proportions of Bush supporters, Dukakis supporters, and those with no opinion in the survey population. An estimand of interest is \theta_1 - \theta_2, the population difference in support for the two major candidates.

Ok, let’s put this in code using pymc3. First, import the required libraries.

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

The data.

obs = np.array([727, 583, 137])
bush_supp = obs[0] / sum(obs)
dukakis_supp = obs[1] / sum(obs)
other_supp = obs[2] / sum(obs)

arr = np.array([bush_supp, dukakis_supp, other_supp])
print('The proportion array is', arr)
print('The supporters array is', obs)

And now the model

with pm.Model() as model_3:
    theta1 = pm.Uniform('theta1', lower=0, upper=1)
    theta2 = pm.Uniform('theta2', lower=0, upper=1)
    theta3 = pm.Uniform('theta3', lower=0, upper=1)
# or theta1 = pm.Uniform('theta1', lower=0, upper=1, shape=3)
    post = pm.Multinomial('post', n=1447, p=[theta1, theta2, theta3], shape=3, observed=obs)

    diff = pm.Deterministic('diff', theta1 - theta2)

diff is what I am looking for. And now, sample.

with model_3:
    trace_3 = pm.sample(draws=50_000, tune=10_000)

Look at the draws, it will be important later. The trace plot is

and the summary is

             mean 	sd 	       mc_error 	hpd_2.5 	hpd_97.5 	n_eff 	       Rhat
theta1 	0.752658 	0.185972 	0.007974 	0.393953 	0.999931 	324.360295 	1.009122
theta2 	0.607154 	0.153961 	0.006487 	0.300763 	0.850772 	333.936984 	1.009070
theta3 	0.143408 	0.038012 	0.001552 	0.066682 	0.206904 	357.452126 	1.008408
diff 	0.145503 	0.050030 	0.001494 	0.048851 	0.237703 	705.530788 	1.003811

Look at the effective sample size, 50000 draws and only 330 are used. How is that possible? And the number of divergences…

According to the book, this is a simple example of a multinomial model.

So my questions are:

  • Can this be improved? The mean of diff is 0.14 and the book says it is close to 0.10.
  • Am I doing something wrong here? The book says With a noninformative uniform prior distribution on θ and I want to keep that instruction, although if anybody has a better prior… of course I will add it.

That’s because your model is overparameterized. In a multinomial distribution, the parameter p is a simplex (all elements sum to 1). In this case, you only need to know the first two element of p, because the third one is p_3 = 1 - p_1 - p_2.

You can try with a Dirichlet prior:

In [10]: with pm.Model() as m:
    ...:     theta = pm.Dirichlet('theta', a=np.ones_like(obs))
    ...:     post = pm.Multinomial('post', n=obs.sum(), p=theta, observed=obs)
    ...:     trace = pm.sample(1000, tune=5000)

1 Like