I am implementing an LDA model on a dataset for which the model is :
theta = pm.Dirichlet("thetas", a=alpha, shape=(D, K))
phi = pm.Dirichlet("phis", a=beta, shape=(K, V))
z = pm.Categorical("zx", p=theta, shape=(W,D))
w = pm.Categorical("wx", p=t.reshape(phi[z.T], (D*W, V)),
I have very large values of D,W somewhere in range of 90,000 and 8000 respectively. When I try to run the model it throws an error of cannot allocated memory. I have 16GB of RAM on my computer.
How we should handle in such cases?
Help much appreciated.
Hmmm, this is going to be difficult to solve, I suggest you to divide the data into batches and run inference on each batch and combine them later.
If I split the data in batches and run for loop over the model, inference should be based on the random variables in each iteration for each set of mini batch. But how do I make final inference?
Is this way of using for loop for batches a good option?
Can I use
pm.Minibatch even with MCMC sampling? If yes can you please provide some example on it.
You can have a look at https://arxiv.org/pdf/1505.02827.pdf and https://arxiv.org/pdf/1501.03326.pdf
And no, I dont think using
pm.Minibatch with MCMC sampling is a good idea, especially if you use gradient based sampler, see https://arxiv.org/abs/1502.01510