Running with minibatches (memory constraints)

Syr · January 24, 2018, 2:19pm

Hi all,

I’m looking for some help understanding how ‘Minibatches’ actually work. I was under the impression that it just creates give indicators/identifiers for subsets of the training data, update the estimates of the parameters using this subset (I’m using MeanField approx with SGD as the optimiser), and continue onto the next set? That way, RAM requirements wouldn’t really scale with increased training set size because there’s no real need to create anything (in memory) that’s much larger than the original set?

Instead, RAM usage goes up as I add more training samples and changing the minibatch size only seems to offer minimal control over it?

I’m trying to implement a model like this one in the documentation, where it includes replacements = {doc_t: doc_t_minibatch}.

I’m confused and can’t really work it out from the documentation, so any help would be greatly appreciated!

Thanks a bunch

Syr · January 24, 2018, 3:18pm

Just as a bit more info, it’s supposed to be a LLDA model that looks something like this;

import pymc3 as pm
import numpy as np
import scipy.sparse as sps
import theano.tensor as tt
from pymc3.distributions.transforms import t_stick_breaking
from theano import shared
import theano
theano.config.compute_test_value = ‘off’

class LLDA_model_pymc3:
“”" takes in sparse matrix of feature vectors and a dataframe of labels “”"

def __init__(self, word_counts, feature_names, labels):
    self.wordCounts = word_counts
    self.feature_names = feature_names
    self.labels = labels
    self.nTopics = labels.shape[1]  # K
    self.vocabLen = word_counts.shape[1] # V
    self.nDocs = word_counts.shape[0]    # D
    self.nTokens = np.sum(word_counts[word_counts.nonzero()])

def build_pymc3_model(self, minibatchSize=200):
    self.minibatchSize = minibatchSize
    
    def logp_lda_doc(beta, theta):
        """Returns the log-likelihood function for given documents. 
        K : number of topics in the model
        V : number of words (size of vocabulary)
        D : number of documents (in a mini-batch)
        Parameters
        ----------
        beta : tensor (K x V)
            Word distributions. 
        theta : tensor (D x K)
            Topic distributions for documents (set as strong Dirichlet for supervised model) 
        """
        def docLiklihoodFunction(docs):
            documentIndex, vocabIndex = docs.nonzero()
            vocabFreqs = docs[documentIndex, vocabIndex]
            docLikelihood = vocabFreqs * pm.math.logsumexp(
                tt.log(theta[documentIndex]) + tt.log(beta.T[vocabIndex]), axis=1).ravel()

            # per-word log-likelihood * num of tokens in the whole dataset
            return tt.sum(docLikelihood) / tt.sum(vocabFreqs) * self.nTokens 

        return docLiklihoodFunction
    
    self.doc_t_minibatch = pm.Minibatch(self.wordCounts.toarray(), minibatchSize)
    self.doc_t = shared(self.wordCounts.toarray()[:minibatchSize], borrow=True)
    self.topic_t = shared(np.asarray(self.labels)[:minibatchSize], borrow=True)
    self.topic_t_minibatch = pm.Minibatch(np.asarray(self.labels), minibatchSize)

    with pm.Model() as model:
        beta = pm.Dirichlet('beta', a=pm.floatX((1.0 / self.nTopics) * np.ones((self.nTopics, self.vocabLen))),
                         shape=(self.nTopics, self.vocabLen), transform=t_stick_breaking(1e-9))
        doc = pm.DensityDist('doc', logp_lda_doc(beta, self.topic_t), observed=self.doc_t)
    
    self.model = model

def inference(self, n_steps = 10000, start_learn_rate = 0.1):
    try:
        self.model
    except: 
        print("No pymc model has been defined")
    else:
        n = start_learn_rate
        s = shared(n)
        def reduce_rate(a, h, i):
            s.set_value(n/((i/self.minibatchSize)+1)**.7)

        with self.model:
            approx = pm.MeanField()
            approx.scale_cost_to_minibatch = False
            inference = pm.KLqp(approx)

        inference.fit(n_steps, callbacks=[reduce_rate], obj_optimizer=pm.sgd(learning_rate=s),
                      total_grad_norm_constraint=200,
                      more_replacements={self.doc_t:self.doc_t_minibatch, self.topic_t:self.topic_t_minibatch})

        self.approx = approx
        
        samples = pm.sample_approx(approx, draws=100)
        self.vocab_samples = samples['beta'].mean(axis=0)
    
def print_top_words(self, n_top_words=10):
    try:
        self.vocab_samples
    except:
        print("Error, build model + perform inference first")
    else:
        for i in range(len(self.vocab_samples)):
            print(("Topic #%d: " % i) + " ".join([self.feature_names[j]
                for j in self.vocab_samples[i].argsort()[:-n_top_words - 1:-1]]))
        
def predictions(self, test_word_counts, softmax = True):
    
    def softmax(x):
        e_x = np.exp(x - np.max(x, axis=1)[:, None])
        return e_x/e_x.sum(axis=1)[:, None]
    
    try:
        self.vocab_samples
    except:
        print("Error, build model + perform inference first")
    else:
        predictions = test_word_counts.dot(self.vocab_samples.transpose())
        if softmax:
            predictions = softmax(predictions)
        return(predictions)`

junpenglao · January 24, 2018, 3:26pm

Minibatch is indexing a subset of your training set - since the training set is already in the memory using Minibatch would not help for that end:

github.com

pymc-devs/pymc3/blob/1e2a0c37f10765e1cbfd556d7317c41625cb74cc/pymc3/data.py#L176-L184


To be more concrete about how we get minibatch, here is a demo
1) create shared variable 
>>> shared = theano.shared(data)


2) create random slice of size 10
>>> ridx = pm.tt_rng().uniform(size=(10,), low=0, high=data.shape[0]-1e-10).astype('int64')


3) take that slice
>>> minibatch = shared[ridx]

Using Minibatch usually improves the speed of gradient computation, thus makes your training goes faster.

Syr · January 24, 2018, 3:37pm

Thanks for your reply

I’m still not really understanding why the RAM requirements of my model scales so heavily with more training samples?

Say like my training set is 1000 samples (~1GB in memory), and my model (constructed with minibatches) takes another 2GB of memory.

If I bump my training size up to 100k samples (5GB), and use the exact same model (with the same minibatch size) then why would the model take up ~30GB? It wouldn’t take up that memory storing indexes for the minibatches, would it?

In the above code, the basic model stays the same and I’m not passing it more input at once (since minibatch size stays the same). So you’d expect the RAM requirements to not be too much more than the increase in training set size?

Sorry if these are stupid questions, but I’m just a bit confused!

Thanks again

junpenglao · January 24, 2018, 4:08pm

I dont have a lot of experience with large dataset, maybe you can try profiling the memory use:
http://deeplearning.net/software/theano/tutorial/profiling.html
http://docs.pymc.io/notebooks/profiling.html

theano.config.profile = True 
theano.config.profile_memory = True 
model.profile(model.logpt).summary()

Syr · January 24, 2018, 4:37pm

Ahhhhh, I know what’s happening now!

The casting of the sparse matrix is in the “build_pymc3_model” function that I was running.

self.doc_t_minibatch = pm.Minibatch(self.wordCounts.toarray(), minibatchSize)
self.doc_t = shared(self.wordCounts.toarray()[:minibatchSize], borrow=True)

wordCounts gets expanded inside the function, and that’s why the RAM is exploding. I should only cast the minibatch to dense as it’s needed. I’m doing it all and then setting indexes for the minibatches on that. Nothing to do with how PyMC3/minibatches work, just my own stupidity.

Thanks a lot for your help! Would have taken me a lot longer to realise without you

Topic		Replies	Views
How to make Minibatch for multi-dimensional data? Questions	10	2556	September 17, 2020
Minibatch when latent variable size depends on data dimension Questions	2	692	February 8, 2019
Minibatch not working v5 bug	11	431	October 2, 2024
pm.Minibatch Doc string v5	1	39	May 24, 2025
Slicing variables from minibatches Questions	4	1158	December 16, 2017

Running with minibatches (memory constraints)

Related topics