How to implement Zero-Inflated Multinomial?

AlexAndorra · October 16, 2019, 3:24pm

I got all my data and can confirm that the zero-inflated categories are mostly 3 and 4: 25% of zeros in the sample – category 1’s zero-inflation is near-negligible, with about 5% zeros.
I read lots of papers, and it seems to be an active area of research! From what I understood, there are two options - none of which I know how to implement in PyMC:

Generalizing Diallo et al., for every observation i, we assume that given the total C_{1i} + C_{2i} + ... + C_{6i} + C_{7i} = N_i, the multinomial response C_i = (C_{1i}, C_{2i}, ..., C_{7i}) is generated from the model:

\begin{equation} C_i \sim \begin{cases} (C_{1i}, C_{2i}, 0, C_{4i}, ..., C_{7i}), & \text{with probability $\pi$}\\ (C_{1i}, C_{2i}, C_{3i}, 0, ..., C_{7i}), & \text{with probability $\phi$}\\ mult(N_i, p_i), & \text{with probability $1- \pi - \phi$} \end{cases} \end{equation}

\text{where}\ p_i = (p_{1i}, ..., p_{7i}) \ \text{and}\ \Sigma_j(p_{ji}) = 1

Reverse the thinking and model the non-zero-inflated categories (2, 5 and 6) as baseline-inflated: because they were available all the time, some of their numbers are inflated compared to their true proportion, when all categories are available. Thus, following Cai et al., if p(k|\theta) denotes a multinomial PMF with 7 categories, the model’s likelihood is:

\begin{equation} p(Y_i = k) = \begin{cases} \pi_k +(1 - \pi_2 - \pi_5 - \pi_6)\ p(k|\theta), & \text{if $k = 2, 5, 6$}\\ (1 - \pi_2 - \pi_5 - \pi_6)\ p(k|\theta), & \text{otherwise} \end{cases} \end{equation}

where k is the kth category in the multinomial distribution, and \pi_k is the probability of inflation of category k.

I prefer option 1 (more intuitive, less convoluted), but which is easier to implement in PyMC? How would I even go about it?
A huge thanks in advance for your help
(of course, please tell me if there is any mistake above)

Topic		Replies	Views
Data representation for mixtures of multinomials, and Categorical vs Mixture? Questions	5	999	April 17, 2019
Why my multinomial model with categorical predictors and response differs so much from results in R? version agnostic modeling	5	75	January 26, 2025
Multinomial mixture with observed-values themselves as mixtures Questions	8	1030	March 29, 2019
How to create model with series of multinomials, each of which has observations on only a subset of components? v5 modeling	1	339	November 27, 2022
"Generalized ZeroInflatedPoisson" / observed deterministic variable Questions	2	449	December 4, 2020

How to implement Zero-Inflated Multinomial?

Related topics