Using dataset as observed in likelihood

gaddamanil16 · September 7, 2018, 4:46am

Hi,

@junpenglao
I am working with some huge data for LDA. I want to use all the data at once for getting my posterior inference. I am trying to use theano.sparse.as_sparse_variable so that I can give that as observed in my log likelihood calculation.

theano_sparse_data = theano.sparse.as_sparse_variable(sparse_data)  
def log_lda(theta,phi):
            def ll_lda(value):  
                 dixs, vixs = value.nonzero()
                 vfreqs = value[dixs, vixs]
                 ll =vfreqs* pm.math.logsumexp(t.log(theta[dixs]) + t.log(phi.T[vixs]), axis = 1).ravel()
                 return t.sum(ll) 
            return ll_lda

with model: 
     theta = pm.Dirichlet("thetas", a=alpha, shape=(D, K))
     phi = pm.Dirichlet("phis", a=beta, shape=(K, V))
     doc = pm.DensityDist('doc', log_lda(theta,phi), observed=theano_sparse_data)

I am trying to find way to use something like this so that I can use all my data at once.

P.S: I cant convert the matrix to dense because it runs into Memory Error.

Help much needed.

Thanks in advance.

junpenglao · September 7, 2018, 9:19am

Try supplying the observed value as a 3 column numpy array, with first column and second column being the indexes and the last column being the value:

def log_lda(theta, phi, value):
    ll =value[:, 2] * pm.math.logsumexp(t.log(theta[value[:, 0]]) + t.log(phi.T[value[:, 1]), axis = 1).ravel()
    return t.sum(ll) 

with model:
    ...
    doc = pm.DensityDist('doc', log_lda, observed=dict(theta=theta, phi=phi, value=sparse_data))

Also, dont use theano sparse as the support is limited and it doesn allows you to do everything.

gaddamanil16 · September 10, 2018, 4:27pm

Hi,

Thanks for the reply. It worked, only that I modified a little in way the observed data is feeded.

Again thanks a ton!

Topic		Replies	Views
Using sparse matrices as observed in DensityDist Questions	1	488	October 12, 2018
Unobserved variables in blackbox likelihood Questions	11	1087	October 14, 2019
Running into Theano issues when using DensityDist Questions theano	6	1183	February 10, 2021
How to use theano to create a custom loglikelihood for an array of arrays Questions	13	2189	July 20, 2018
Prediction/setting data fails with multivariate observed Questions theano , bug	2	905	September 21, 2021

Using dataset as observed in likelihood

Related topics