Hi everyone,
I have been playing around with PyMC for a while. I am using results of an election as my toy dataset. The dataset consists of district level votes two candidates had and the gender of voters like this:
district
candidate 1
candidate 2
total votes
male
female
place a
1000
500
1500
770
730
place b
300
600
900
400
500
…
I want to study voting preference difference between men and women; p_{male}-p_{female}. The most natural model, I feel like, is:
candidate_1 = male_1 + female_1
male_1 \sim binomial(male, p_{male})
female_1 \sim binomial(female, p_{female})
with some priors on p_{male} and p_{female}.
But this requires convolution of two Binomial distributions, which doesn’t seem straight-forward looking at the previous discussions.
Oh, I didn’t look careful at the example data. You know the number of males and female voter, but not how many each candidate got from each…
Yeah, it sounds like you would like to have is a convolution of Binomials, or a Poisson Binomial, which is not really tractable for your large N.
Since you have a large N, and possibly not extreme p, you could use the Normal approximation to the Binomial, and then the convolution of the two normals has a closed form solution.
This is probably close in spirit to your idea of modelling p as the gender ratio (I don’t know exactly what you had in mind there though).