What is the best way to estimate theta and psi for ZeroInflatedPoisson?

jordan.howell2 · May 26, 2022, 7:52pm

Hello,

I need to run a model on a target variable that is ZeroInflatedPoisson. I see from Discrete — PyMC3 3.11.5 documentation that the distribution takes parameters theta and psi. I have read that I should estimate theta as number of samples == 0/number of samples.

What is the best way to set psi? The document says psi is Expected proportion of Poisson variates (0 < psi < 1). I don’t know know what that means.

Also, how would I use this with a regression model? Is the regression formula used for the parameter theta?

jessegrabowski · May 27, 2022, 3:40am

I think this is backwards. I found this discussion of the ZIP parameteres helpful.

Theta is the usual rate parameter of a poisson distribution. You can confirm this by direct comparison of the 2nd branch of the ZIP PDF shown on the page you linked, and the PDF of a normal Poisson (which you recover when phi = 1).

So you want to estimate theta as usual in a normal Poisson setup: do some stuff to get logits, then set theta = exp(logits).

Psi is related to how many “extra” zeros you want to generate. As noted above, when phi = 1, you get a standard Poisson PDF, so all the zeros in your data are assumed to come from a Poisson. When phi = 0, you are saying that your Poisson process ~~NEVER~~ ONLY generates zeros.

So you could set psi ~ Beta(1, 1) if you want to be uninformative, or you could try to look at your data and try to skew it one way or the other, but you’ll probably want to use some kind of beta prior.

ricardoV94 · May 27, 2022, 4:51am

You are actually saying you can only generate zeros, all draws come from the zero inflated component.

jessegrabowski · May 27, 2022, 5:00am

Yes you are exactly right. I missed the \psi in front of the Poisson process in the x \neq 0 branch of the PDF. Thank you for correcting me.

jordan.howell2 · May 27, 2022, 9:48am

Thank you. When you say “logits,” I think of a quantile function to get a probability of a binary outcome. Is that what you mean? How does that reconcile to trying to forecast count data?

jessegrabowski · May 27, 2022, 9:52am

I just mean some quantity you compute in \mathbb R^n before you put it through a linking function. I don’t know what the right word for this is in Bayesian lingo, I took this word “logits” from deep learning literature. Latent variable? Latent value?

jordan.howell2 · May 27, 2022, 9:54am

Ha! I get so confused with using the right “lingo”. Thankyou!

Topic		Replies	Views
How to model historical data in the right distribution version agnostic modeling	7	732	July 3, 2022
Zero Inflated Poisson Log-Lik Questions	3	1740	August 2, 2019
The zero-inflated exponential Questions	5	1507	January 18, 2019
Poisson regression divide by zero Questions	1	536	December 3, 2021
Zero One Inflated Beta Regression Questions	14	1162	January 26, 2024

What is the best way to estimate theta and psi for ZeroInflatedPoisson?

Related topics