How to speed up Pymc model compilation of a LDA topic model

@cluhmann @junpenglao Thanks so much for your responses. Help a lot. Follow up on this - I am building on the LDA model to turn it into a supervised LDA (Mcauliffe, Blei 2007):

I am having difficulty writing efficient codes to get the average topic frequencies i.e., z_bar in the above algorithm using Pymc so the code is still super slow. Below is my code. Do you have any insights to turn it into more efficient? Thanks so much !!!

def LDA_GLM(omega, y, K, M, N_V, Ni, alpha, gamma):
	with pm.Model() as model:
		eta = pm.Normal('b', mu=0, sigma=1, shape=K) # coefficient of linear regression on y
		sigma2 = pm.InverseGamma('sigma2',alpha =1.2, beta= 1.5) # variance of y 
		phi = pm.distributions.Dirichlet('phi', a=gamma, shape=(K, N_V)) # topic word matrix
		theta = pm.distributions.Dirichlet('theta', a=alpha, shape=(M, K)) # topic document matrix 
		omega = pm.DensityDist("doc", logp_lda_doc(phi, theta), observed=doc_t) # word document
        Z = [pm.Categorical("z_{}".format(d), p=theta[d], shape=Ni[d]) for d in tqdm(range(M))] # topic assignment 
		Z_bar = [[pm.math.sum([Z[i] == k]) / Ni[i] for k in range(K)] for i in tqdm(range(len(Z)))] # average topic 
		Z_bar = pm.math.stack(Z_bar, axis = 0) # turn Z into design matrix
		Y = pm.Normal('y', mu= pm.math.dot(Z_bar, eta), sigma = sigma2, shape =M, observed =y) # outcome variable 
	return model