# Modeling voting preference by gender

Hi everyone,
I have been playing around with PyMC for a while. I am using results of an election as my toy dataset. The dataset consists of district level votes two candidates had and the gender of voters like this:

district candidate 1 candidate 2 total votes male female
place a 1000 500 1500 770 730
place b 300 600 900 400 500

I want to study voting preference difference between men and women; p_{male}-p_{female}. The most natural model, I feel like, is:

candidate_1 = male_1 + female_1

male_1 \sim binomial(male, p_{male})

female_1 \sim binomial(female, p_{female})

with some priors on p_{male} and p_{female}.

But this requires convolution of two Binomial distributions, which doesn’t seem straight-forward looking at the previous discussions.

So, how would you model that?

If you know the female and male votes for each candidate, there’s no need to add them up

That’s the thing, I don’t. I know number of male and female voters and number of votes candidate a & b got, as in the table above.

One possible way to model is defining p as a function of male/female ratio. But what I described earlier feels like a more natural generative process.

Oh, I didn’t look careful at the example data. You know the number of males and female voter, but not how many each candidate got from each…

Yeah, it sounds like you would like to have is a convolution of Binomials, or a Poisson Binomial, which is not really tractable for your large `N`.

Since you have a large `N`, and possibly not extreme `p`, you could use the Normal approximation to the Binomial, and then the convolution of the two normals has a closed form solution.

This is probably close in spirit to your idea of modelling `p` as the gender ratio (I don’t know exactly what you had in mind there though).

1 Like