@lucianopaz and others - related question: is it possible to specify the observed data as a collection of counts, rather than individual observations? E.g. going back to my initial example of a 3-bin model:
[0, 1, 2]
This is currently passed in to the categorical mixture model as
[2,2,1]
This is OK for smaller datasets, but as I am modeling DNA sequencing data with millions of molecules that fall into a relatively small number of bins, it becomes limiting unless pymc3 actually represents things as counts under the hood (which I don’t think it does in my current formulation).
Said another way, the effort required in computing the likelihood of a simple binomial model shouldn’t depend on the total observation count, since you’re just raising p and 1-p to some potentially large powers.
So - is this a matter of reformulating / re-specifying my model, or are there native ways to encode observed data in this way that avoids arrays having length of the size of my data?