Thanks a lot for your advice. The first script actually works pretty quickly, taking ~1min. to run, even given my large input data (~5 million rows, 16 columns). I can easily see myself incorporating this into my PyMC3 workflow.
As for the more robust alternative you mentioned utilizing ADVI, it seems to be incredibly slower than the first script, taking ~3hr.s to run. Is it supposed to be like that?
dycontri, I knew there must have been techniques like this published somewhere, so my google skills must have failed. I will look into it, thanks.