Hi! I am Sanaica and I’ve been recently following the Streaming/Online Inference for GSOC 2026.
Over the last few days, I’ve been benchmarking on how PyMc handles continuous data (like API of yfianace). My goal was to bypass the graph re-compilation bottleneck that occurs when feeding real-time data.
By utilizing inherently mutable pm.Data containers alongside pm.ADVI(), I built a continuous while loop that ingests real-time inputs (tick-by-tick prices + exogenous sentiment scores). It updates the underlying parameters, and samples the predictive without stopping or recompiling.
As you can see in the benchmark graph the streaming architecture drops latency from ~12.0 seconds per update (Traditional MCMC) down to ~0.1 seconds per update.
Here is a snapshot of the live dashboard generated from the streaming architecture, showing the continuous API inputs automatically updating the Bayesian probability bands (Expected Return μ and Uncertainty ±1σ) in milliseconds:
- Top Panel (Price & Signals): A black line showing the Bitcoin price, overlaid with colored dots (Green for Buy, Red for Sell).
- Middle Panel (Whale Sentiment): A red/green bar chart showing the simulated Billionaire 13F data (-1 to 1).
- Bottom Panel (The PyMC Math): An orange line showing the Expected Return.
As I draft a formal GSoC proposal, I want to ensure my focus aligns perfectly with the core team’s roadmap. Would you prefer a proposal that focuses on, a formal “Streaming Adapter/Wrapper” that standardizes how PyMC ingests live Python generators or Dask streams? Or focusing strictly on the math backend like recursive Bayesian updates (using the posterior as the new prior)?
Any guidance on which is higher priority for 2026 would be appreciated!

