Balancing Likelihood Terms for Extended LDA Model

mtvector · August 15, 2019, 8:01am

Hi there,
I’ve been developing a version of LDA to remove a topic consisting of a known contaminating mixture and have two questions about the model specification. The model is trained on a sparse matrix of doc x word counts X. The model consists of an LDA model made up of standard K-1 topics. However in theta, to get the Kth topic proportion it is drawn from a normal distribution with a known mean and sd and normalized by the sum of rows in X. In addition, in phi the Kth topic has a known/fixed mixture of words. I’m not trying to learn the gaussian’s parameters, or phi K’s mixture, I’m trying to constrain the LDA model using them.

From my understanding, the likelihood of the whole model should be the LDA log likelihood ll plus the gaussian log likelihood ambientll (I’ve tried balancing the two terms by scaling them by the variables on which they depend, which works somewhat). Is there a good way to deal with balancing the terms in a chimeric model like this?

model1 = pm.Model()
#Number of topics
K=10
#X is scipy sparse matrix of word x topic counts
(D,V)=X.shape
#beta prior for word over topic distribution (phi)
beta = np.ones((K-1, V))*10
sparse_array=shared(np.array([X.nonzero()[0],X.nonzero()[1],X.data]).T.astype('int32'))
tt.cast(sparse_array,'int32')
rowsums=shared(np.sum(X,axis=1).T)
sumall=shared(np.sum(X))

#Concatenate kth topic drawn from gaussian to gammas
#Make rows sum to one to turn gammas into theta dirichlet
def normalizerows(gammas,ambdist,rowsums,alpha=1e-9):
   fixambdist=tt.min(tt.concatenate([10**ambdist.T/rowsums,tt.ones(ambdist.T.shape)]),axis=0).reshape([gammas.shape[0],1])
    normalized=10**(tt.log10(1-fixambdist+alpha)+tt.log10(gammas+alpha))/(tt.sum(gammas+alpha,axis=1).reshape([gammas.shape[0],1]))
    normalized = tt.concatenate([normalized,fixambdist],axis=1)
    return normalized

#Set Kth topic in phi to known mixture (saves 10% training time)
def appendphi(phi,phiAmbient):
    return tt.concatenate([phi,phiAmbient.reshape([1,phi.shape[1]])],axis=0)

def log_lda_basic(theta, phi,value):
    ll = value[:,2] * pm.math.logsumexp(tt.log(theta[value[:,0].astype('int32')]+1e-10)+ tt.log(phi.T[value[:,1].astype('int32')]+1e-10),axis=1).ravel()                                                                  
    ambientll=ambdist.distribution.logp(tt.log10((rowsums*theta[:,theta.shape[1]-1])+1e-10))
    #If you don't multiply whole likelihood by large number, ADVI won't fit properly
    return(1e9*(tt.sum(ll))+ tt.sum(ambientll))

with model1: 
    #Empirical parameters for fixed gaussian
    ambientmu=pm.Deterministic('ambientmu',shared(ambmu))
    ambientsigma=pm.Deterministic('ambientsigma',shared(ambsd))
    ambdist = pm.TruncatedNormal('ambdist',shape=(D,1),mu=ambientmu,sigma=ambientsigma,lower=.1)
    gammas = pm.Gamma('gammas',alpha=1,beta=1,shape=(D, K-1))
    #Theta is Doc x Topic mixtures
    theta = pm.Deterministic('theta',normalizerows(gammas,ambdist,rowsums))
    phihat = pm.Dirichlet("phihat", a=beta, shape=(K-1, V), transform=t_stick_breaking(1e-9))
    #phi is Topic x Word mixtures
    phi = pm.Deterministic('phi',appendphi(phihat,shared(phiAmbient)))
    doc = pm.DensityDist('loglikelihood',log_lda_basic,observed=dict(theta=theta, phi=phi,value=sparse_array))

with model1:    
    inference = pm.ADVI()
    approx = pm.fit(n=1000,method= inference,obj_optimizer=pm.adam(learning_rate=shared(.3)))

You can see here that the fixed normal distribution is weighted too heavily, as the values of theta k are essentially the expected value of the fixed distribution. The fixed distribution is in blue and fitted thetaK * rowsums (Number of counts belonging to theta K) is orange.

My larger question is this: The model is meant to specify that the posterior is equivalent to the fixed/observed normal distribution. Is there a good way to model this? I’ve thought about doing this by using a K-S test between the observed/fixed normal distribution and the posterior of theta K (although this is very hard to do with theano) as a likelihood function to force the posterior to be the correct shape, but I’m sure there is a better way to to this. Any input or resources would be very helpful! Thanks!

Topic		Replies	Views
Mixture of unigrams model - slow when large number of observations Questions	1	361	January 14, 2021
AEVB Implementation Question: LDA model with 2 latent variables using AEVB Questions	5	775	September 26, 2019
Issue on 2-D marginalized GMM Questions	10	827	November 20, 2018
Mixture of Linear models Questions	3	1090	July 21, 2018
Could you give me some advice about this modle? Questions	1	613	August 9, 2018

Balancing Likelihood Terms for Extended LDA Model

Related Topics