Poisson Binomial in PyMC3

I want to analyze data that consists of a series of trials (1/0) with a different probability for 1 on each of those trials. I assume I should use a Poisson Binomial distribution for this, but this is not one of the built-in distributions in PyMC3. There are a few packages to calculate the probability mass function of the Poisson Binomial, but I’m not sure how to combine these with PyMC3. From reading the docs, I gather that I should probably use pymc3.DensityDist, but I’m not quite sure how to begin. So my questions are:

  1. Am I right in thinking that I should use DensityDist for this?
  2. Are there example notebooks you recommend that I could work through to figure out how DensityDist works?
  3. Has anyone by any chance used a Poisson Binomial in pymc3 and has an example that they are willing to share?

thanks!

I haven’t used a Poisson binomial model yet, though I’m interested to see if it works.

  1. Am I right in thinking that I should use DensityDist for this?

Yup, you could use DensityDist or Potential to do this - they’re largely the same from the point of view of model fitting.

  1. Are there example notebooks you recommend that I could work through to figure out how DensityDist works?

Check this notebook out.
The only thing that might be tricky here is getting a stable log PDF. I am a complete novice at working with this distribution, but it looks like there is a recursive formula which is probably not going to work well with PyMC. That wikipedia page also has a formula obtained by using the discrete Fourier transform.

Maybe it would be OK if you draw a \mathbb{p} vector from a Dirichlet and use that as the parameters for a Bernoulli? The sum of that outcome should be the distribution that you want…

1 Like

That will get you parameters which won’t necessarily throw any errors, but a value like p=[0.5, 0.9, 0.6] won’t sum to 1 but is still a perfectly valid vector of parameters for the Poisson binomial.

Thanks, I’ll give it a try and post my results here if I manage.

Very true, my mistake. The correct thing would be to put a Beta (or Uniform or something like that) prior on each p_i.

1 Like