I want to analyze data that consists of a series of trials (1/0) with a different probability for 1 on each of those trials. I assume I should use a Poisson Binomial distribution for this, but this is not one of the built-in distributions in PyMC3. There are a few packages to calculate the probability mass function of the Poisson Binomial, but I’m not sure how to combine these with PyMC3. From reading the docs, I gather that I should probably use pymc3.DensityDist, but I’m not quite sure how to begin. So my questions are:
Am I right in thinking that I should use DensityDist for this?
Are there example notebooks you recommend that I could work through to figure out how DensityDist works?
Has anyone by any chance used a Poisson Binomial in pymc3 and has an example that they are willing to share?
I haven’t used a Poisson binomial model yet, though I’m interested to see if it works.
Am I right in thinking that I should use DensityDist for this?
Yup, you could use DensityDist or Potential to do this - they’re largely the same from the point of view of model fitting.
Are there example notebooks you recommend that I could work through to figure out how DensityDist works?
Check this notebook out.
The only thing that might be tricky here is getting a stable log PDF. I am a complete novice at working with this distribution, but it looks like there is a recursive formula which is probably not going to work well with PyMC. That wikipedia page also has a formula obtained by using the discrete Fourier transform.
Maybe it would be OK if you draw a \mathbb{p} vector from a Dirichlet and use that as the parameters for a Bernoulli? The sum of that outcome should be the distribution that you want…
That will get you parameters which won’t necessarily throw any errors, but a value like p=[0.5, 0.9, 0.6] won’t sum to 1 but is still a perfectly valid vector of parameters for the Poisson binomial.