Marginalizing out a categorical variable

I apologize. The notation is a bit subtle, and I screwed it up. Thinking generally, you have your parameters of interest (call them \zeta), nuisances (call them \theta), and the data (call it \mathcal{X}); and you have some likelihood P(\mathcal{X}|\zeta, \theta) and priors P(\zeta, \theta) = P(\zeta)P(\theta). The approach I’ve outlined is:

(1) Q(\mathcal{X}|\theta) = \int P(\mathcal{X}|\zeta, \theta)P(\zeta)d\zeta
(2) Q_\mathrm{post}(\theta|\mathcal{X}) = \frac{Q(\mathcal{X}|\theta)P(\theta)}{\int Q(\mathcal{X}|\theta)P(\theta)d\theta}
(3) P_\mathrm{mar}(\mathcal{X}|\zeta) = \int P(\mathcal{X}|\zeta, \theta)Q_\mathrm{post}(\theta|\mathcal{X})d\theta
(4) P_\mathrm{post}(\zeta | \mathcal{X}) = \frac{P_\mathrm{mar}(\mathcal{X}|\zeta)P(\zeta)}{\int P_\mathrm{mar}(\mathcal{X}|\zeta)P(\zeta)d\zeta}

In your case \mathcal{X} = (e, c), \theta = (\sigma, \alpha_0, \alpha_1) and \zeta = (\zeta_i). Because \zeta_i is discrete, (1) is a sum rather than integral; and (4) is proportional to (3).

This is just one pass of generalized E-M, after (4) you can go back to (1) and replace P(\zeta) with P_\mathrm{post}(\zeta|\mathcal{X}) and repeat the process. The procedure does converge to the true posteriors. And if instead of computing the full integral at each step, you instead use only single sample from the (iteratively-updated) posteriors, this procedure is exactly Gibbs sampling.

Given that you have lots of data, and that a categorical distribution is not particularly complicated, I would expect the procedure to converge quickly; I assume one iteration is enough.

3 Likes