Hi — I’m Yicheng Yang, a junior at UIUC (CS + Statistics + Economics), applying for GSoC 2026. I wanted to share my thinking on the streaming variational inference direction and get some early feedback.
Background
I take STAT 432 (Statistical Learning) at UIUC, which covers stochastic gradient methods, and I’ve been going through the Baruch Pre-MFE Numerical Linear Algebra seminar, where we cover L-BFGS and quasi-Newton convergence. The practical motivation comes from my own projects: I built a real-time trading system for prediction markets that ingests continuous data streams, and I maintain clawdfolio, a portfolio analytics package that has to process financial time series that regularly exceed memory on a standard machine. I know firsthand what it feels like when your analysis pipeline hits a memory wall.
What I’ve Explored So Far
I’ve been reading through pymc/variational/ to understand the existing infrastructure:
-
The
MeanFieldapproximation and the ADVI inference loop ininference.pyuseself.approx.logp_nojac, which calls into the PyTensor graph over the full dataset. There’s no batching at the ELBO level — the existingpm.Minibatchapproach works at the data-indexing level but assumes the full array is pre-loaded. -
Pathfinderuses L-BFGS to walk toward the posterior mode, then approximates the inverse Hessian from the trajectory. The optimizer is deterministic — no stochastic gradient support. -
MinibatchRVand theMinibatchclass indata.pyare the natural extension points. The scaling factor (N / batch_size) is already computed; the missing piece is feeding batches from an iterator rather than random-sampling from a pre-loaded array.
Proposed Approach
The core idea: wrap an arbitrary Python iterator in a StreamingDataset object that handles the N-scaling problem and plugs into the existing pm.Minibatch infrastructure. From there:
-
Streaming ADVI — replace the full-data ELBO call with a scaled batch ELBO, consume data from the iterator, add a CUSUM-based convergence monitor (standard ELBO plateau checks don’t work well for streaming since the distribution can shift).
-
Streaming Pathfinder — adapt L-BFGS to stochastic gradients using the overlap-correction technique from Moritz et al. 2016 — two independent batches per step to get an unbiased curvature estimate.
Questions for Rob
-
On the Minibatch scaling: the current implementation assumes
Nis known. For streams with unknown total size, would you prefer an explicitapproximate_nparameter, or an adaptive estimator that updatesNonline? -
Architectural scope: is there appetite for modifying the Pathfinder optimizer directly, or would it be cleaner to keep Pathfinder untouched and build a parallel
StreamingPathfinderclass? -
Stability under high gradient noise: are there known issues with the
MeanField+ Adam combination at very small batch sizes that I should account for?