Hi everyone,
I have been playing around with PyMC for a while. I am using results of an election as my toy dataset. The dataset consists of district level votes two candidates had and the gender of voters like this:

district

candidate 1

candidate 2

total votes

male

female

place a

1000

500

1500

770

730

place b

300

600

900

400

500

…

I want to study voting preference difference between men and women; p_{male}-p_{female}. The most natural model, I feel like, is:

candidate_1 = male_1 + female_1

male_1 \sim binomial(male, p_{male})

female_1 \sim binomial(female, p_{female})

with some priors on p_{male} and p_{female}.

But this requires convolution of two Binomial distributions, which doesn’t seem straight-forward looking at the previous discussions.

Oh, I didn’t look careful at the example data. You know the number of males and female voter, but not how many each candidate got from each…

Yeah, it sounds like you would like to have is a convolution of Binomials, or a Poisson Binomial, which is not really tractable for your large N.

Since you have a large N, and possibly not extreme p, you could use the Normal approximation to the Binomial, and then the convolution of the two normals has a closed form solution.

This is probably close in spirit to your idea of modelling p as the gender ratio (I don’t know exactly what you had in mind there though).