I am working with some huge data for LDA. I want to use all the data at once for getting my posterior inference. I am trying to use
theano.sparse.as_sparse_variable so that I can give that as observed in my log likelihood calculation.
theano_sparse_data = theano.sparse.as_sparse_variable(sparse_data) def log_lda(theta,phi): def ll_lda(value): dixs, vixs = value.nonzero() vfreqs = value[dixs, vixs] ll =vfreqs* pm.math.logsumexp(t.log(theta[dixs]) + t.log(phi.T[vixs]), axis = 1).ravel() return t.sum(ll) return ll_lda with model: theta = pm.Dirichlet("thetas", a=alpha, shape=(D, K)) phi = pm.Dirichlet("phis", a=beta, shape=(K, V)) doc = pm.DensityDist('doc', log_lda(theta,phi), observed=theano_sparse_data)
I am trying to find way to use something like this so that I can use all my data at once.
P.S: I cant convert the matrix to dense because it runs into Memory Error.
Help much needed.
Thanks in advance.