Hi y’all!
I’m working on multinomial count data, but there is zero-inflation for some categories. I wonder how to handle that with PyMC.
Imagine you have several subjects, several observations per subject, and covariates for each observation. At each observation, subjects have to choose between several categories, but sometimes, one or more of the categories are not available for people to choose - this is deliberate, not a mistake.
As a result, you have a mixture of processes:
- one where some categories have 0 counts because people couldn’t choose them
- one with a classical multinomial process between all categories (which can also produce zeros, although very rarely when there is lots of trials)
My goal is to infer in a regression the latent probability of each category, without it being biased by the excess zeros. I feel like a zero-inflated multinomial would be appropriate here, but maybe so could be a censored-data model with the pm.Mixture
class?
I’m not very experienced in these types of models, so I’m really curious about your take! Also, if the ZIMultinomial is the way to go, I noticed it’s not built into PyMC. I would be interested in doing a PR to add it if you find it useful
Thanks in advance & PyMCheers