How to implement supervised LDA model more efficiently on PyMC

Yixiang_Xu · August 17, 2022, 5:33am

I am building a supervised LDA on PyMC based on the following algorithm (Mcauliffe, Blei 2007):

I am having difficulty writing efficient PyMC codes to get the average topic frequencies i.e., z_bar in the above algorithm and running the attached code is very slow. What is the best way to get the average topic frequencies and implement supervised LDA on PyMC?

My codes (modified the PyMC LDA codes from here)

def LDA_GLM(omega, y, K, M, N_V, Ni, alpha, gamma):
	with pm.Model() as model:
		eta = pm.Normal('b', mu=0, sigma=1, shape=K) # coefficient of linear regression on y
		sigma2 = pm.InverseGamma('sigma2',alpha =1.2, beta= 1.5) # variance of y 
		phi = pm.distributions.Dirichlet('phi', a=gamma, shape=(K, N_V)) # topic word matrix
		theta = pm.distributions.Dirichlet('theta', a=alpha, shape=(M, K)) # topic document matrix 
		omega = pm.DensityDist("doc", logp_lda_doc(phi, theta), observed=doc_t) # word document
        Z = [pm.Categorical("z_{}".format(d), p=theta[d], shape=Ni[d]) for d in tqdm(range(M))] # topic assignment 
		Z_bar = [[pm.math.sum([Z[i] == k]) / Ni[i] for k in range(K)] for i in tqdm(range(len(Z)))] # average topic 
		Z_bar = pm.math.stack(Z_bar, axis = 0) # turn Z into design matrix
		Y = pm.Normal('y', mu= pm.math.dot(Z_bar, eta), sigma = sigma2, shape =M, observed =y) # outcome variable 
	return model

Thanks!!

junpenglao · August 17, 2022, 8:16am

Hi @Yixiang_Xu , please follow up on your original post How to speed up Pymc model compilation of a LDA topic model - #4 by Yixiang_Xu instead of opening a new post .
I am closing this post for now.

Topic		Replies	Views
How to speed up Pymc model compilation of a LDA topic model Questions	9	1263	August 23, 2022
LDA implementation with pymc3 Questions	21	5802	August 22, 2020
Supervised Topic Models in PyMC3 Questions	0	398	January 11, 2021
Questions from replicating Latent Dirichlet Allocation work Questions	1	425	June 14, 2021
AEVB Implementation Question: LDA model with 2 latent variables using AEVB Questions	5	814	September 26, 2019

How to implement supervised LDA model more efficiently on PyMC

Related topics