You should understand it as sum over all the (latent) topics, as the word is the actual observed. So the general idea is that for each specific observed word, use the observed (a tokenized index) to index across all topics, and sum over the topics. And you do that for all the observed words, then sum over the resulting vector.