Balancing Likelihood Terms for Extended LDA Model

Hi there,
I’ve been developing a version of LDA to remove a topic consisting of a known contaminating mixture and have two questions about the model specification. The model is trained on a sparse matrix of doc x word counts X. The model consists of an LDA model made up of standard K-1 topics. However in theta, to get the Kth topic proportion it is drawn from a normal distribution with a known mean and sd and normalized by the sum of rows in X. In addition, in phi the Kth topic has a known/fixed mixture of words. I’m not trying to learn the gaussian’s parameters, or phi K’s mixture, I’m trying to constrain the LDA model using them.

From my understanding, the likelihood of the whole model should be the LDA log likelihood ll plus the gaussian log likelihood ambientll (I’ve tried balancing the two terms by scaling them by the variables on which they depend, which works somewhat). Is there a good way to deal with balancing the terms in a chimeric model like this?

model1 = pm.Model()
#Number of topics
#X is scipy sparse matrix of word x topic counts
#beta prior for word over topic distribution (phi)
beta = np.ones((K-1, V))*10

#Concatenate kth topic drawn from gaussian to gammas
#Make rows sum to one to turn gammas into theta dirichlet
def normalizerows(gammas,ambdist,rowsums,alpha=1e-9):
    normalized = tt.concatenate([normalized,fixambdist],axis=1)
    return normalized

#Set Kth topic in phi to known mixture (saves 10% training time)
def appendphi(phi,phiAmbient):
    return tt.concatenate([phi,phiAmbient.reshape([1,phi.shape[1]])],axis=0)

def log_lda_basic(theta, phi,value):
    ll = value[:,2] * pm.math.logsumexp(tt.log(theta[value[:,0].astype('int32')]+1e-10)+ tt.log(phi.T[value[:,1].astype('int32')]+1e-10),axis=1).ravel()                                                                  
    #If you don't multiply whole likelihood by large number, ADVI won't fit properly
    return(1e9*(tt.sum(ll))+ tt.sum(ambientll))

with model1: 
    #Empirical parameters for fixed gaussian
    ambdist = pm.TruncatedNormal('ambdist',shape=(D,1),mu=ambientmu,sigma=ambientsigma,lower=.1)
    gammas = pm.Gamma('gammas',alpha=1,beta=1,shape=(D, K-1))
    #Theta is Doc x Topic mixtures
    theta = pm.Deterministic('theta',normalizerows(gammas,ambdist,rowsums))
    phihat = pm.Dirichlet("phihat", a=beta, shape=(K-1, V), transform=t_stick_breaking(1e-9))
    #phi is Topic x Word mixtures
    phi = pm.Deterministic('phi',appendphi(phihat,shared(phiAmbient)))
    doc = pm.DensityDist('loglikelihood',log_lda_basic,observed=dict(theta=theta, phi=phi,value=sparse_array))

with model1:    
    inference = pm.ADVI()
    approx =,method= inference,obj_optimizer=pm.adam(learning_rate=shared(.3)))

You can see here that the fixed normal distribution is weighted too heavily, as the values of theta k are essentially the expected value of the fixed distribution. The fixed distribution is in blue and fitted thetaK * rowsums (Number of counts belonging to theta K) is orange.


My larger question is this: The model is meant to specify that the posterior is equivalent to the fixed/observed normal distribution. Is there a good way to model this? I’ve thought about doing this by using a K-S test between the observed/fixed normal distribution and the posterior of theta K (although this is very hard to do with theano) as a likelihood function to force the posterior to be the correct shape, but I’m sure there is a better way to to this. Any input or resources would be very helpful! Thanks!