Hi,
@junpenglao
I am working with some huge data for LDA. I want to use all the data at once for getting my posterior inference. I am trying to use theano.sparse.as_sparse_variable
so that I can give that as observed in my log likelihood calculation.
theano_sparse_data = theano.sparse.as_sparse_variable(sparse_data)
def log_lda(theta,phi):
def ll_lda(value):
dixs, vixs = value.nonzero()
vfreqs = value[dixs, vixs]
ll =vfreqs* pm.math.logsumexp(t.log(theta[dixs]) + t.log(phi.T[vixs]), axis = 1).ravel()
return t.sum(ll)
return ll_lda
with model:
theta = pm.Dirichlet("thetas", a=alpha, shape=(D, K))
phi = pm.Dirichlet("phis", a=beta, shape=(K, V))
doc = pm.DensityDist('doc', log_lda(theta,phi), observed=theano_sparse_data)
I am trying to find way to use something like this so that I can use all my data at once.
P.S: I cant convert the matrix to dense because it runs into Memory Error.
Help much needed.
Thanks in advance.