Modeling US Presidential Election w/ Polling

Hi all,

I built a model for the current US presidential election using PyMC. Here’s a presentation summarizing the findings with nice graphics, the 2024 two-way model, the 2024 three-way model, and backtesting with the 2020 model.

I based the model on Alexandre Andorra and Rémi Louf’s blog post on modeling the popularity of French presidents. I was able to recreate/update their code here to run on the current version of PyMC. I also used Heidemanns, Gelman, and Morris’ paper outlining the model used by The Economist.

I would appreciate any feedback, especially with how to handle the long tails with the potential third-party spoiler. Because there’s only one poll in my home state of Washington where Trump leads due to Kennedy pulling more Biden voters away, the model uses the n=1 to say that in a three-way race Trump may flip WA. I am of two minds - I kind of want to just let the model roll and wait to see if there is more three-way state level polling (inevitably will happen). Kennedy is polling better nationally than in any individual state, so maybe there’s a real effect here not seen in the state-level polling.

Thanks,
Andrew

6 Likes

@AlexAndorra

1 Like

Also, I tried a Dirichlet distribution but it took forever to sample. The results ended up being identical (near as I can tell) to running three separate binomials for each poll, because they add up to 100% anyways. Are there any other distributions I should consider?

Great job, well done Andrew! And fantastic plots :star_struck:

Regarding using the binomial for more than two candidates, I would avoid doing that, because that’s gonna underestimate the correlations between the candidates – i.e the zero sum game where one has to lose for another to win.

The model can’t take these correlations into account if the probabilities are estimated independently. A multinomial will be what you’re looking for here.
You can see how we did it for the 2022 French presidential elections hereassociated podcast episode.

You’ll also see we’re using a custom GP in this model, instead of the random walk from the blog post you referenced. Today, I would very probably use an HSGP instead.

Hope this helps, and PyMCheers :v:

Thank you @AlexAndorra for the feedback! I’ll be traveling the next few days but I’ll try my hand at implementing this. I appreciate it.

Thanks,
Andrew

Hi @AlexAndorra, I am working on switching to a multinomial. Because I am using both two-way and three-way polling, is it possible to use a multinomial with zeroes for the polls without a candidate?

It’s possible, but I wouldn’t do that, as those polls are not directly comparable: respondents are not given the same choices, so the underlying probabilities differ.

Something you could do is use masking to force the option that’s not there to have zero probability – e.g -999, which is gonna turn into 0 when you softmax everything to get the vector of probabilities. It’s what I did a few years ago in the presidential model I linked a few posts above, IIRC.

You can also always use two different likelihoods in the same models if that’s easier for now.

Hope this helps :vulcan_salute:

Hi @AlexAndorra, I’ve picked back up my model and wrote it up in a blog post here. I have been trying to implement your suggestion of using a multinomial and also doing an HGSP. I’ve been trying for a few days but haven’t been able to get either to work, maybe I am missing something obvious. :slight_smile: I gave you a citation at the bottom.

For me the multinomial is the most pressing, just because of the implications in the correlations between outputs. Right now I have the following binomial repeated thrice:

    dem_vote = pm.Binomial(
        "dem_vote",
        p = dem_polling,
        n = df["sample_size"],
        observed = df['sample_size'] * df['DEM'],
        dims = "observation",
    )

How can I re-write this for a multi-nomial? I tried using the three polling vars and stacking them to make an array for “p” but I keep getting a shape issue.

Here is my random walk component that should be re-written for HGSP, it’s also repeated for ‘gop’ and for ‘oth’ as well as ‘dem’.

    epsilon = 1e-6

    dem_sigma       = pm.HalfNormal("dem_sigma", sigma=0.1) + epsilon
    dem_rho         = pm.Normal("dem_rho", mu=0, sigma=0.7)
    dem_ar          = pm.AR("dem_ar",
                            rho=[dem_rho] * 7,
                            sigma=dem_sigma,
                            init_dist=pm.Normal.dist(mu=0, sigma=0.05),
                            dims="day"
                           )
    dem_sigma_rw    = pm.HalfNormal("dem_sigma_rw", sigma=0.1) + epsilon
    dem_random_walk = pm.GaussianRandomWalk("dem_random_walk",
                                            sigma=dem_sigma_rw,
                                            init_dist=pm.Normal.dist(mu=0, sigma=0.3),
                                            dims="day")
    dem_day_effect  = pm.Deterministic("dem_day_effect", dem_ar + dem_random_walk, dims="day")

Thanks!

2 Likes

Also since RFK Jr. dropped out I am not longer worried about the two different types of polls! Thank you for the suggestion here.

For whomever is interested, I wrote up an update with the model’s latest output. Additionally, I have ran backtests for every US Presidential election since 2004.

2 Likes

Hi all, since some folks have been reaching out about election modeling - I just made a model for the upcoming German federal election. I revised my code to add in a momentum factor and a multivariate Gaussian random walk. This better allows for multi-party elections. (It also now takes over three hours to run on my poor laptop but… the results look great!)

2 Likes