I’m trying to train a hierarchical regression model with ADVI on a big data set (10 million+ rows, 100+ categories, 50+ features).
I’m also using pymc3_models.
I’m able to load all the data in memory, and when i finally fit the model i use a
minibatch_size of 100. Since the array is able to load completely in memory, i’d expect the training to be fast but i’m still getting 13 seconds per iteration. Sampling from the numpy array should be fast, so I’m not sure what the issue is.
Does anyone have an suggestions on how i can improve this?