Hi all,

I built a model for the current US presidential election using PyMC. Here’s a presentation summarizing the findings with nice graphics, the 2024 two-way model, the 2024 three-way model, and backtesting with the 2020 model.

I based the model on Alexandre Andorra and Rémi Louf’s blog post on modeling the popularity of French presidents. I was able to recreate/update their code here to run on the current version of PyMC. I also used Heidemanns, Gelman, and Morris’ paper outlining the model used by The Economist.

I would appreciate any feedback, especially with how to handle the long tails with the potential third-party spoiler. Because there’s only one poll in my home state of Washington where Trump leads due to Kennedy pulling more Biden voters away, the model uses the n=1 to say that in a three-way race Trump may flip WA. I am of two minds - I kind of want to just let the model roll and wait to see if there is more three-way state level polling (inevitably will happen). Kennedy is polling better nationally than in any individual state, so maybe there’s a real effect here not seen in the state-level polling.

Thanks,

Andrew

5 Likes

Also, I tried a Dirichlet distribution but it took forever to sample. The results ended up being identical (near as I can tell) to running three separate binomials for each poll, because they add up to 100% anyways. Are there any other distributions I should consider?

Great job, well done Andrew! And fantastic plots

Regarding using the binomial for more than two candidates, I would avoid doing that, because that’s gonna underestimate the correlations between the candidates – i.e the zero sum game where one has to lose for another to win.

The model can’t take these correlations into account if the probabilities are estimated independently. A multinomial will be what you’re looking for here.

You can see how we did it for the 2022 French presidential elections here – associated podcast episode.

You’ll also see we’re using a custom GP in this model, instead of the random walk from the blog post you referenced. Today, I would very probably use an HSGP instead.

Hope this helps, and PyMCheers

Thank you @AlexAndorra for the feedback! I’ll be traveling the next few days but I’ll try my hand at implementing this. I appreciate it.

Thanks,

Andrew

Hi @AlexAndorra, I am working on switching to a multinomial. Because I am using both two-way and three-way polling, is it possible to use a multinomial with zeroes for the polls without a candidate?

It’s possible, but I wouldn’t do that, as those polls are not directly comparable: respondents are not given the same choices, so the underlying probabilities differ.

Something you could do is use masking to force the option that’s not there to have zero probability – e.g -999, which is gonna turn into 0 when you softmax everything to get the vector of probabilities. It’s what I did a few years ago in the presidential model I linked a few posts above, IIRC.

You can also always use two different likelihoods in the same models if that’s easier for now.

Hope this helps

Hi @AlexAndorra, I’ve picked back up my model and wrote it up in a blog post here. I have been trying to implement your suggestion of using a multinomial and also doing an HGSP. I’ve been trying for a few days but haven’t been able to get either to work, maybe I am missing something obvious. I gave you a citation at the bottom.

For me the multinomial is the most pressing, just because of the implications in the correlations between outputs. Right now I have the following binomial repeated thrice:

```
dem_vote = pm.Binomial(
"dem_vote",
p = dem_polling,
n = df["sample_size"],
observed = df['sample_size'] * df['DEM'],
dims = "observation",
)
```

How can I re-write this for a multi-nomial? I tried using the three polling vars and stacking them to make an array for “p” but I keep getting a shape issue.

Here is my random walk component that should be re-written for HGSP, it’s also repeated for ‘gop’ and for ‘oth’ as well as ‘dem’.

```
epsilon = 1e-6
dem_sigma = pm.HalfNormal("dem_sigma", sigma=0.1) + epsilon
dem_rho = pm.Normal("dem_rho", mu=0, sigma=0.7)
dem_ar = pm.AR("dem_ar",
rho=[dem_rho] * 7,
sigma=dem_sigma,
init_dist=pm.Normal.dist(mu=0, sigma=0.05),
dims="day"
)
dem_sigma_rw = pm.HalfNormal("dem_sigma_rw", sigma=0.1) + epsilon
dem_random_walk = pm.GaussianRandomWalk("dem_random_walk",
sigma=dem_sigma_rw,
init_dist=pm.Normal.dist(mu=0, sigma=0.3),
dims="day")
dem_day_effect = pm.Deterministic("dem_day_effect", dem_ar + dem_random_walk, dims="day")
```

Thanks!

2 Likes

Also since RFK Jr. dropped out I am not longer worried about the two different types of polls! Thank you for the suggestion here.

For whomever is interested, I wrote up an update with the model’s latest output. Additionally, I have ran backtests for every US Presidential election since 2004.

1 Like