I am trying to implement the LDA model in Pymc with 1000 documents, each document has word length around 150 - 200 and the vocabulary size is 500. I wrote the following codes and it worked on a smaller sample (e.g., 10 document). However, the codes takes a very long time to compile with the current sample size (1000 documents, word length 150 - 200, 500 vocabulary). The most time consuming part is the last line of code which is to generate omega (the word document), with an estimated time cost 3 - 5 hours to compile (not even reaching the sampling stage). Is there any suggestions on how to speed up the code compilation?

FYI, the code is being implemented on Colab Pro +.

Codes -

```
def LDA_GLM(omeg, Ni, K, M, N_V, alpha, gamma):
# omega is the document
# Ni is the word count per document
# K is number of topics, M is number of documents, N_V is vocabulary size
with pm.Model() as model:
phi = pm.distributions.Dirichlet('phi', a=gamma, shape=(K, N_V)) # topic-vocabulary matrix
theta = pm.distributions.Dirichlet('theta', a=alpha, shape=(M, K)) # topic-document matrix
DATA_d = [pm.Categorical("z_{}".format(d), p=theta[d], shape=Ni[d]) for d in tqdm(range(M))] # assign topic to each word token
omega = [pm.Categorical("w_%i_%i" % (d,i),
p = phi[DATA_d[d][i]],
#value=omeg[d][i],
observed=omeg[d][i])
for d in tqdm(range(M)) for i in range(Ni[d])] # assign word to each word token based on the chosen topic
return model
M = 1000
K = 3
N_V = 500
alpha = np.array([10.0, 5.0, 8.0])
gamma = np.random.choice([10, 20, 30], N_V)
ldaglm = LDA_GLM(omega, Ni, K, M, N_V, alpha, gamma)
```