# Partial Pooling for Election Polls

Hey everyone, new PyMC3 user here. I work mostly with machine learning and I’ve been trying to learn as much as I can about probabilistic programming in my free time. Anyway, I put together a model for the Georgia run off election in the United States based on the partial pooling baseball example on the website, and I’m curious to know if it seems reasonable to the pros on here.

My idea is that the true vote share for the state is unobservable, but each poll can give us a glimpse. However each poll should have its own distribution for house effects or sample bias or whatever. Similar to the Efron and Morris baseball example where each player has their own distribution that informs the distribution of the population of professional baseball players.

The data

samplesize = [605, 1250, 500, 800, 1377, 1450, 583, 1064, 300, 1500]
num_votes = [296, 631, 247, 404, 703, 717, 312, 499, 143, 734]

And the model

with pm.Model() as warnock_model:

phi = pm.Beta('phi', alpha=alpha, beta=beta)

kappa_log = pm.HalfNormal('kappa_log', sigma=1)
kappa = pm.Deterministic('kappa', tt.exp(kappa_log))

thetas = pm.Beta(
'thetas',
alpha=phi*kappa,
beta=(1.0-phi)*kappa,
)

y = pm.Binomial(
'y',
n=samplesize,
p=thetas,
)

This model seems reasonable to me. Plotting the forest plot for each pollster with the actual poll result in orange looks good.

And the posterior taking phi to be estimated state vote share looks realistic to me as well.

So my questions now are what types of posterior predictive checks should I do?

Then if I wanted to weight the polls by time what is the best way to do that? My thinking would be artificially decrease the sample size by some function, that should increase the uncertainty for those polls that are far away from the election date? However I’m not sure if that is a commonly accepted practice.

Hi Florian,
And welcome
I frequently work on such models, so I can give you some pointers for resources:

• I put all my models here. Note that I work mainly on French elections, which have many parties, with a changing identity, so that adds dimensions and complications compated to the US, but the ideas stay the same. I gave a talk at PyMCon this year, walking through this exact model.
• I find the most satisfying conceptual approach is Linzer’s dynamic Bayesian model, extended this year by Gelman, Heidemanns and Morris for The Economist’s model. They use an MvGRW to handle the temporal correlations between polls, and the hierarchical structure is there to partially pool between states (but you probably don’t care about that last part in your case).
• As a matter of fact, I interviewed Andrew and Merlin on my podcast just before the US election this year

That’s your opportunity to be creative! You can use ArviZ’s functions for generic posterior plots, but, ultimately, each model needs its own posterior visualizations, especially big, complex, high-dimensional models.
You’ll see in my repo and my talk that I used custom plots to better understand the model. Feel free to reuse this if useful. Again though, each model requires at least partly custom viz, as it’s usually tailored to your use-case and domain knowledge.

That’s indeed a good rule of thumb I think! I actually used it myself in the model in my repo, and that does work. If it’s your first model, that’s a good idea to start simple, with good-enough heuristics and then you’ll build on it iteratively – in my case for instance, I plan to add the clean time-dependency part in the next iteration of my model. Start small and grow with it

Hope this helps

2 Likes

Hi Alex

Thank you for taking the time on the reply! I actually watched part of your talk and I have it bookmarked for later. I also looked at the dynamic Bayesian model from Linzer and I started to work on that but I decided it was probably better to get a simple model done first. The only part of that model that seems mysterious to me is the reverse GRW. I like the idea of using a GRW to act almost like a Khalman filter, like Jackman 2005, but I’m not sure I understand why Linzers would go in reverse. I guess to ensure that the national component decays to 0?

Also, and maybe this is a misunderstanding on my part, but figure 3 in the Linzer paper is concerning to me. I understand that the solid horizontal line is the actual outcome and the model is forecasting something close to that. But to me nothing in the data up to 2 months before election day seems to suggest anything that would trend that way. If I were developing this model and I saw that forecast with 2 months to go I would think that something is wrong.

My only guess as to whats happening here given my (poor) understanding of the model is that its regression towards the national polling average (which tends to be closer to 50%). However this would be a bad prediction for all the “safe” states in the US elections. It looks good here for the two swing states obviously.

Anyway, thanks again for your response. It seems like I’m on the right track with a baby model on a baby election. I’ll watch your talk and see if I can pick up any tricks!

Ha ha yeah, sorry, the talk is a bit long – I talk too much

It’s to make sure the GRW “walks” towards its priors instead of away from it when one doesn’t have much data. And the nice thing is that the prior is actually the result of another model, called “the fundamentals” model.
Linzer uses the “Time for change” model, but basically it’s a regression of past election results on a domain-expert-informed selection of socio-economic variables – so, yeah, there are actually two models

Indeed, nothing in the data trends that way, but it’s because we don’t have any data (i.e polls) from 2 months in the future. When that happens, the beauty of this model is that it reverts to the fundamentals forecast (which doesn’t contain polls, only socio-econ variables available well in advance of the election) to still be able to make a forecast. Otherwise, you would just get a huge, uninformative uncertainty.
If you’ve already used Gaussian Processes, you can draw a parallel here, as GPs revert back to their mean when data become sparse.

Restating what I wrote above, in case it wasn’t clear: the model is reverting to the state-level fundamentals forecast when data become sparse.

Definitely a good strategy
Good luck and happy holidays

Thanks again for taking the time to produce a long thoughtful response! I’m enjoying this conversation, and happy holidays to you too!

It’s to make sure the GRW “walks” towards its priors instead of away from it when one doesn’t have much data. And the nice thing is that the prior is actually the result of another model, called “the fundamentals” model.

Ok this makes a lot of sense. Seems so simple now that you explain it haha. Thanks!

When that happens, the beauty of this model is that it reverts to the fundamentals forecast (which doesn’t contain polls, only socio-econ variables available well in advance of the election) to still be able to make a forecast.

In this case though shouldn’t each state then walk toward the dashed line since thats the Time-for-Change model? Florida does but Indiana drifts away from it. Indiana also far less polls than Florida so I would expect the model to be more heavily leaning towards the Time-for-Change outputs. Furthermore, with 2 months before the election Indiana starts at the Time-for-Change result and forecasts a drift away from it.

Maybe I’m too focused on that single figure but I just can’t square that result with my understanding of the model.

Not necessarily, because the trend in each state is not only influenced by the fundamentals – it’s also influenced by the opinion trends in the 49 other states (through the hierarchical structure and the \delta_j coefficients).
You don’t see that for Florida because the national trend and the fundamentals agree, but Indiana is indeed a good illustration of that fact, as noted in the paper:

In Indiana, Obama was slightly ahead of the historical forecast, but this was atypical: in most states, as in Florida, fewer voters than expected were reporting a preference for Obama. As a result, estimates of \delta_j < 0; so after the final day of polling, \pi_{ij} trended upward to Election Day. […] In Indiana, \pi_{ij} moved away from the structural forecast, but again toward the actual outcome—thus correcting the substantial 5.5% error in the original Time-for-Change forecast.

Hope this helps, but again, don’t get hung up on that, you’ll get there eventually – lots of moving parts in this model, so it’s completely normal to get confused

Ok cool, super helpful again. I’ll have to go through the paper again more carefully and maybe implement it myself or trace it out somehow to fully put all the pieces together.