How to speed up Pymc model compilation of a LDA topic model

Yixiang_Xu · August 16, 2022, 6:18am

@cluhmann @junpenglao Thanks so much for your responses. Help a lot. Follow up on this - I am building on the LDA model to turn it into a supervised LDA (Mcauliffe, Blei 2007):

I am having difficulty writing efficient codes to get the average topic frequencies i.e., z_bar in the above algorithm using Pymc so the code is still super slow. Below is my code. Do you have any insights to turn it into more efficient? Thanks so much !!!

def LDA_GLM(omega, y, K, M, N_V, Ni, alpha, gamma):
	with pm.Model() as model:
		eta = pm.Normal('b', mu=0, sigma=1, shape=K) # coefficient of linear regression on y
		sigma2 = pm.InverseGamma('sigma2',alpha =1.2, beta= 1.5) # variance of y 
		phi = pm.distributions.Dirichlet('phi', a=gamma, shape=(K, N_V)) # topic word matrix
		theta = pm.distributions.Dirichlet('theta', a=alpha, shape=(M, K)) # topic document matrix 
		omega = pm.DensityDist("doc", logp_lda_doc(phi, theta), observed=doc_t) # word document
        Z = [pm.Categorical("z_{}".format(d), p=theta[d], shape=Ni[d]) for d in tqdm(range(M))] # topic assignment 
		Z_bar = [[pm.math.sum([Z[i] == k]) / Ni[i] for k in range(K)] for i in tqdm(range(len(Z)))] # average topic 
		Z_bar = pm.math.stack(Z_bar, axis = 0) # turn Z into design matrix
		Y = pm.Normal('y', mu= pm.math.dot(Z_bar, eta), sigma = sigma2, shape =M, observed =y) # outcome variable 
	return model

Topic		Replies	Views
LDA implementation with pymc3 Questions	21	5807	August 22, 2020
Questions from replicating Latent Dirichlet Allocation work Questions	1	428	June 14, 2021
Mixture of unigrams model - slow when large number of observations Questions	1	432	January 14, 2021
How to return documents assigned to topics? Questions	0	349	February 2, 2021
Advice for speeding up large multivariate models Questions	17	2144	April 12, 2018

How to speed up Pymc model compilation of a LDA topic model

Related topics