Modeling US Presidential Election w/ Polling

awalters · July 12, 2024, 5:15pm

Hi all,

I built a model for the current US presidential election using PyMC. Here’s a presentation summarizing the findings with nice graphics, the 2024 two-way model, the 2024 three-way model, and backtesting with the 2020 model.

I based the model on Alexandre Andorra and Rémi Louf’s blog post on modeling the popularity of French presidents. I was able to recreate/update their code here to run on the current version of PyMC. I also used Heidemanns, Gelman, and Morris’ paper outlining the model used by The Economist.

I would appreciate any feedback, especially with how to handle the long tails with the potential third-party spoiler. Because there’s only one poll in my home state of Washington where Trump leads due to Kennedy pulling more Biden voters away, the model uses the n=1 to say that in a three-way race Trump may flip WA. I am of two minds - I kind of want to just let the model roll and wait to see if there is more three-way state level polling (inevitably will happen). Kennedy is polling better nationally than in any individual state, so maybe there’s a real effect here not seen in the state-level polling.

Thanks,
Andrew

cluhmann · July 12, 2024, 5:49pm

@AlexAndorra

awalters · July 12, 2024, 6:54pm

Also, I tried a Dirichlet distribution but it took forever to sample. The results ended up being identical (near as I can tell) to running three separate binomials for each poll, because they add up to 100% anyways. Are there any other distributions I should consider?

AlexAndorra · July 14, 2024, 10:01am

Great job, well done Andrew! And fantastic plots

Regarding using the binomial for more than two candidates, I would avoid doing that, because that’s gonna underestimate the correlations between the candidates – i.e the zero sum game where one has to lose for another to win.

The model can’t take these correlations into account if the probabilities are estimated independently. A multinomial will be what you’re looking for here.
You can see how we did it for the 2022 French presidential elections here – associated podcast episode.

You’ll also see we’re using a custom GP in this model, instead of the random walk from the blog post you referenced. Today, I would very probably use an HSGP instead.

Hope this helps, and PyMCheers

awalters · July 14, 2024, 10:47am

Thank you @AlexAndorra for the feedback! I’ll be traveling the next few days but I’ll try my hand at implementing this. I appreciate it.

Thanks,
Andrew

awalters · July 20, 2024, 10:18pm

Hi @AlexAndorra, I am working on switching to a multinomial. Because I am using both two-way and three-way polling, is it possible to use a multinomial with zeroes for the polls without a candidate?

AlexAndorra · July 22, 2024, 8:31pm

It’s possible, but I wouldn’t do that, as those polls are not directly comparable: respondents are not given the same choices, so the underlying probabilities differ.

Something you could do is use masking to force the option that’s not there to have zero probability – e.g -999, which is gonna turn into 0 when you softmax everything to get the vector of probabilities. It’s what I did a few years ago in the presidential model I linked a few posts above, IIRC.

You can also always use two different likelihoods in the same models if that’s easier for now.

Hope this helps

awalters · October 27, 2024, 4:13pm

Hi @AlexAndorra, I’ve picked back up my model and wrote it up in a blog post here. I have been trying to implement your suggestion of using a multinomial and also doing an HGSP. I’ve been trying for a few days but haven’t been able to get either to work, maybe I am missing something obvious. I gave you a citation at the bottom.

For me the multinomial is the most pressing, just because of the implications in the correlations between outputs. Right now I have the following binomial repeated thrice:

    dem_vote = pm.Binomial(
        "dem_vote",
        p = dem_polling,
        n = df["sample_size"],
        observed = df['sample_size'] * df['DEM'],
        dims = "observation",
    )

How can I re-write this for a multi-nomial? I tried using the three polling vars and stacking them to make an array for “p” but I keep getting a shape issue.

Here is my random walk component that should be re-written for HGSP, it’s also repeated for ‘gop’ and for ‘oth’ as well as ‘dem’.

    epsilon = 1e-6

    dem_sigma       = pm.HalfNormal("dem_sigma", sigma=0.1) + epsilon
    dem_rho         = pm.Normal("dem_rho", mu=0, sigma=0.7)
    dem_ar          = pm.AR("dem_ar",
                            rho=[dem_rho] * 7,
                            sigma=dem_sigma,
                            init_dist=pm.Normal.dist(mu=0, sigma=0.05),
                            dims="day"
                           )
    dem_sigma_rw    = pm.HalfNormal("dem_sigma_rw", sigma=0.1) + epsilon
    dem_random_walk = pm.GaussianRandomWalk("dem_random_walk",
                                            sigma=dem_sigma_rw,
                                            init_dist=pm.Normal.dist(mu=0, sigma=0.3),
                                            dims="day")
    dem_day_effect  = pm.Deterministic("dem_day_effect", dem_ar + dem_random_walk, dims="day")

Thanks!

awalters · October 27, 2024, 4:17pm

Also since RFK Jr. dropped out I am not longer worried about the two different types of polls! Thank you for the suggestion here.

awalters · November 1, 2024, 9:07pm

For whomever is interested, I wrote up an update with the model’s latest output. Additionally, I have ran backtests for every US Presidential election since 2004.

awalters · January 17, 2025, 2:10am

Hi all, since some folks have been reaching out about election modeling - I just made a model for the upcoming German federal election. I revised my code to add in a momentum factor and a multivariate Gaussian random walk. This better allows for multi-party elections. (It also now takes over three hours to run on my poor laptop but… the results look great!)

Topic		Replies	Views
Multi-party election modeling Questions	0	661	January 18, 2019
Dice, Polls & Dirichlet Multinomials Sharing	9	1326	July 21, 2021
Partial Pooling for Election Polls Questions	8	938	January 6, 2021
Help with MvStudentT/MvNormal political-election model Questions	4	738	March 6, 2019
Probabilistic forecast of European elections in France Sharing	8	966	April 16, 2019

Modeling US Presidential Election w/ Polling

Related topics