Data representation for mixtures of multinomials, and Categorical vs Mixture?

In my opinion yes, it is better to use categorical instead of multinomial. And yes, you use zero based ordering.

About memory consumption, multinomial indeed uses more memory because observations are (N, len(p)) instead of (N,). Furthermore, pymc3 uses up more memory because of two reasons:

  1. If you run with more than a single core, multiprocessing copies the model around (with all its observations) into each process that runs the sampling.
  2. When compiling the model’s logp there are some cloning involved which may produce copies in memory, and not just views of other memory addresses.

If you are memory constrained, use the categorical distribution instead of the multinomial and sample with less cores.

Regardless, I imagine it will be very hard to get good mixing of two discrete distributions. I strongly recommend that you stick with the mixture distribution and not model the latent class as a categorical

1 Like